Skip to content

Notes on Eric Evans’ Lecture ‘The Cassandra Distributed Database’

08/02/2011

History

  • The name Cassandra is taken from Homer’s ‘Iliad.’
  • “A massively scalable decentralized, structured data store.”
  • Written by Facebook by an ex-Amazon Dynamo engineer, open-sourced onto Google Code with very limited list of committers.
  • Continued development was forked to Apache – is now graduated to a top-level project by the Apache board.
  • Came about when working on a problem that exceeded the limits of a traditional relational database.

Data Model

  • Cassandra is basically a “one-hop DHT [Distributed Hash Table],” like Dynamo (Cassandra is modelled on Dynamo).
  • Eventual consistency.
  • Trade-off between consistency and latency is tunable.
  • A DHT uses content addressing, where you have a unique hash for each node and order them highest to lowest, creating a ring. Then apply the hash algorithm to a key to find its location.
  • Cassandra is more complex. Its values are structured into column families (like in HBase).
  • A column family is like a sparse table in a relational database.
  • Cassandra has ‘Supercolumns’ – column families within column families. A record has a key, which stores a collection of supercolumns, which in turn each hold a collection of columns.
  • Columns are represented as a byte array in Java.

    Querying

    • get(): retrieve by column name.
    • multiget(): retrieve by column name for a set of keys.
    • get_slice(): retrieve by column name, or a range of names – can return columns and supercolumns.
    • multiget_slice(): retrieve a subset of columns for a set of keys.
    • get_count(): retrieve a number of columns or sub columns.
    • get_range_slice(): retrieve a subset of columns for a range of keys.

    Updating

    • insert(): insert a column
    • batch_insert(): add/update multiple columns.
    • remove(): remove a column
    • batch_mutate(): like batch_insert() but can also delete – deprecates batch_insert in version 0.6.
    • Removal of key ranges is in alpha stage

    Consistency

    • CAP Theorem: “At any given instance in time, you can only ever guarantee two of Consistency, Availability or Partition tolerance.”
    • As an example, complete consistency can be achieved with a replica count using all consistency levels: zero, one, quorum ((N/2)+1) and all. However, if a node goes down, you are unable to write.
    • Quorum is popular: with a replica count of four, quorum is ((4/2)+1)=3, so you get consistency

    API

    • Can be accessed through Thrift, Ruby, Ruby on Rails, Python, Scala.

    Performance

    • Cassandra is significantly faster than a relational database – this is especially true of writes.

    Writes

    • When a write occurs, an entry is written to a ‘Commitlog,’ then to an in-memory data structure, known as a Memtable. When the Memtable is “full,” the contents is flushed to disk (an SSTable, as seen in Google’s  BigTable).
    • Consequently, there are no reads, no seeks. Everything is sequential. There is atomicity within a column family. You can write to any node.
    • Hinted hand-off: when a node goes down, it is dumped elsewhere, flagged for moving when the node comes back up.

    Reads

    • First, memory is checked – if the value is not there, then we “merge” (look for the most recent value in the Memtable and return that). Indexes are used on each SS table for optimisation and a bloom filter for checking a value is there.
    • Can read any node – have read repair ability.
    • Can use Memcached.

    Case Studies

    • Digg – so far only used for “Green Badges” feature. Could be used for user profile information, user comments, etc. They have terabytes of data, meaning many shards. They are read-dominated. Green Badges is using Cassandra as a primary data store RSN.
    • Twitter has completely migrated to Cassandra, as its data model doesn’t really demand a relational database. They have a million ops/s. Also require heavy sharding. Schema changes are very difficult (maybe impossible).
    • Facebook inbox search – makes good use of supercolumns.
    • Mahalo – focus on availability.

    *Lecture roadmap is out of date.

    Advertisements
    No comments yet

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    %d bloggers like this: