Retry not Rollback: Idempotent Data Models in Cassandra


Naive Consistency

Often the first error handling code I see from new Cassandra users is the client side rollback in an attempt to replicate database transactions from the ACID world. This is typically done when a write to multiple tables fail, or a write is not able to meet requested Consistency Level. Unfortunately, client side rollbacks, while well intentioned, generally create more problems than they solve. Let’s discuss how this will end up

Distributed Problem Set of Rollbacks {#7850}

This is a sample of the type of code I see (obviously not with hardcoded values).

session.execute("INSERT INTO 
(id, text, ts) 
values (1, 'added record', 2015-01-23 11:00:01.923'");
  session.execute("INSERT INTO my_db.user_events 
(user_id, id) values (100, 1)");
}catch(Exception e){
  //bad idea 
  session.execute("DELETE FROM 
where id=1")

The problem with this is, there are several servers responsible for this source of truth, and in a working system, this code will usually work fine with a sleep in between the operations. However, Thread.sleep(100) is rarely a safe approach in practice or remotetly professional. What do you do?

Retry, The Distributed Alternative To Rollbacks {#177d}

However, the typical approach for experienced Cassandra users is to retry the transactions and most drivers will even do this by default for timeouts. Conceptually, if done by hand the code would look more like:

String query = "INSERT INTO my_db.user_events 
(user_id, id) values (100, 1)";
}catch(Exception e){
  // optionally you can even attempt a
  // circuit breaker pattern
  // and write a backup location such as a
  //  queue to retry later
  // only useful in the most extreme cases 
// or most limited          
// configurations

So lets talk about the practical application of these theories in our data model.

Practical Idempotent Data Models {#5622}

Client Side Ids + Buckets {#34d6}

CREATE TABLE my_keyspace.users ( user_id uuid,
 first_name text, last_name text, 
PRIMARY KEY(user_id));

INSERT INTO my_keyspace.users ( user_id, first_name,
 last_name) values ( 
'Ryan', 'Smith' )


Ids are generated on the client or from an external system and therefore are not tied to anything server side.


  • Safe to retry a dozen times if it’s safe to retry one time
  • Must still respect order. So if I have another update to this user_id say a last name change, it must not come AFTER the logically later name change.
  • Immutable {#95cf}

    CREATE TABLE my_keyspace.users (user_id uuid, ts timeuuid,
     attribute_name text, attribute_value text, 
    PRIMARY KEY(user_id, ts, attribute_name))
    INSERT INTO my_keyspace.users ( user_id, ts, 
    attribute_name, attribute_value) values 
    ( e785e49a-996d-4df0-b378-4404798ce088, 
    '2015-01-10 09:56:01.000', 'first_name', 'Ryan')


    There are no updates to any writes beyond retries, another name for this isEvent Sourcing. The number of updates will determine how effective this is and there are a whole raft of other considerations when it comes to partition sizing. This will work really well with the Lambda Architecture as different analytics tools can combine down these separt

    Retries of the same value will have the same result no matter what. This is safely idempotent and free from race conditions that result in permanently inconsistent state.


  • Your timestamps should be based on the time of the update, this will allow to retry with huge time difference and still have an accurate result.
  • Must be aware of partition key width. An upcoming blog post will discuss partition sizing for now rule of thumb is limit it to 100k items and 32megs. While these are not remotely hard and fast rules and more recent versions of Cassandra are happy with a lot more, and different query styles are able to tolerate larger partitions than this, these are good guidelines to start out with if you’re new. Use nodetool cfhistograms to get these numbers on a given node. </li> ### Non-idempotent Anti-Pattern Example {#22b2}
    CREATE TABLE my_keyspace.users ( user_id timeuuid,
     first_name text, last_name text,
     PRIMARY KEY(user_id));
    INSERT INTO my_keyspace.users ( user_id, first_name,
     last_name) values ( now(), 'Ryan', 'Smith' )

    Because it uses a server side generated timeuuid a retry will result in LOTS of different timeuuids and will never give you the same result twice.

    ### Summary {#b77b}

    I hope this has given the reader enough ammo to start building out Idempotent data models that fit in line with distributed principles and lead to a well understood and well behaving application that is tolerant of all sorts of failure modes in a consistent and easy to understand way.