Dynamo Dissected: Condition Expression existence check

Please see my code sample for both a small set of tests illustrating how to confirm nonexistence with DynamoDB, and also for a raw C# recreation of the logic behind applying a condition, which might help illustrate how it works in DynamoDB.

For a TL;DR, if you just want to get to the meat of this post, skip to the bottom.

I want to highlight a common misunderstanding about DynamoDB condition expressions which I have also fallen foul of. Specifically this has to do with checking for the existence or non-existence of an item in a table when performing a write operation, to prevent overwriting something that exists or updating something which does not yet exist. When using a composite primary key this misunderstanding can be made obvious if you use either of the following conditions (assuming your keys are pk and sk respectively, if not, just use your brain to substitute these tokens with your own):

attribute_not_exists(pk) AND attribute_not_exists(sk)
attribute_exists(pk) AND attribute_exists(sk)

If you’re using either string in its entirely, validating both key parts, chances are you have the same misunderstanding I have had. And it’s not just us, a github search for these existence or nonexistence expressions reveals it’s a common misunderstanding, even in the official aws-samples repositories.

I think the way this works accidentally leads people to assuming that it does a value comparison, since it can effectively ensure key uniqueness, but it does not compare values in the slightest and does exactly what it says on the tin: checks if an attribute with the name “pk” exists, and then checks if an attribute with the name “sk” exists. Won’t that be true for every item? Yes it will! Can one of those ever be true while the other is false? No they can’t!

Ultimately it comes down to a critical understanding of how DynamoDB works when writing: First it performs a key lookup to identify the item, then it evaluates the conditional expression, then it performs the write operation. Since key lookup happens first, existence or nonexistence is confirmed internally before running any expression evaluation - in a sense the only reason we’re validating against PK or SK here is because they’re the only guaranteed attributes if something exists. It isn’t obvious in the documentation, though there is an interesting note under Conditional Put which alludes to this behaviour but I think the way it is written is a bit confusing. At the time of writing it reads:

If your primary key consists of both a partition key(pk) and a sort key(sk), the parameter will check whether attribute_not_exists(pk) AND attribute_not_exists(sk) evaluate to true or false before attempting the write operation.

Knowing all of this, it becomes clear that you only have to validate one of the key parts. Not both. And remember, this isn’t doing value comparison, it’s just checking if the item which we have or have not identified has attributes with those names.

Put another way, given the following table of data:

pk	sk
123	abc
123	def
456	abc

Then inserting {pk: 123, sk: hij} with attribute_not_exists(pk) will not fail regardless of the fact two items exist with pk 123. Similarly, inserting {pk: 789, sk: abc} with attribute_not_exists(sk) also won’t fail, again despite items existing with the same value for sk. Only insertion of a full matching key will fail either of these expressions.

But what if I want to validate one key part only?

Well first, stop using DynamoDB wrong.

I’m kidding. I think you might be using it wrong if this is your use case, but I also think there’s similar use cases that just require a bit of a rethink. If you want to enforce that partitions are unique and you haven’t started building anything yet, you can create a table with only partition keys to facilitate this, though that is not my preference.

In the interests of both flexibility and Single Table Design I’ll often have a table declared with a composite primary key with generic key part names (pk and sk). This lets me take full advantage of a partition with a range for the items where that makes sense, but if you have a specific type of item that does not need a range, you can just use a constant value for the sort key which effectively internally means “no sort key” - I tend to use _ e.g. {pk: 123, sk: _ }. If instead you’re trying to validate “partition does not yet exist” or something (you might have gone too far, I don’t know why you’d want to do that) you could achieve it with similar tactic - transactionally writing any partition item alongside a different item with a static range key (again like _), then you have an application where you know that static item exists if anything has ever been written to the partition:

Get { pk: 123, sk: _ } -> No result -> Partition does not exist
Transaction 1: Write { pk: 123, sk: _ } & { pk: 123, sk: abc }
Transaction 2: Write { pk: 123, sk: _ } & { pk: 123, sk: def }
Get { pk: 123, sk: _ } -> Result -> Partition exists

Thinking similar for sort key? Don’t. Sort key doesn’t and should not make sense outside of the context of a given partition, e.g. the desire to check “This sort key is unique across all partitions” leads me to feel like your design might be slightly off, but nonetheless transactions can help us once again.

pk	sk
dog	freddie
rat	otis

How do I prevent users adding any other animal type with the name “freddie”? Transactions again:

GET { pk: sk#freddie, sk: _ } -> No result -> Name is unique
Transaction 1: Write { pk: sk#freddie, sk: _, condition: attribute_not_exists(pk) } & { pk: dog, sk: freddie } -> Success
Transaction 2: Write { pk: sk#freddie, sk: _, condition: attribute_not_exists(pk) } & { pk: rat, sk: freddie } -> Fail

It’s worth underlining that these transactional tactics aren’t a firm guarantee because we can’t enforce at a db schema level - it’s all application code. The database will enforce everything correctly so long as you never mess it up.

But isn’t it more safe to check multiple things anyway?

No, when you understand this, not really at all. It’s impossible for one key part to exist while another does not. It doesn’t matter if you do check for that, but it also doesn’t matter if you do this:

attribute_not_exists(pk) AND attribute_not_exists(pk) AND attribute_not_exists(pk) AND attribute_not_exists(pk)

And to you I might say: Isn’t it more safe to quadruple check that something is true?

So my suggestion is to go with attribute_not_exists(pk) only, as it’s just a simpler expression to maintain, especially if your expression is validating some domain rules beyond the key check. Then, the next time someone sees that and says “wait, shouldn’t we check the full key?” you can help them understand how this works a little better.

Dynamo Dissected: Condition Expression existence check

But what if I want to validate one key part only?

But isn’t it more safe to check multiple things anyway?

Further reading