sfDB5 Talk

This is the annotated version of my presentation on sfDB5. The original slides are over here.

Here goes.


sfDB5

A Schemaless Relational Key-Structure Store

Haneef Mubarak

Hi, I'm Haneef Mubarak and I'd like to talk about [ess-eff-dee-bee-five], a schemaless relational key-structure store.


Common NoSQL Traits

  • Developer Friendly
  • Fully Horizontally Scalable
  • Designed for Commodity (common, cheap) Servers
  • Redundant and Reliable
  • Distributed Shared-Nothing Architecture

As you may have inferred, sfDB5 is a [no-ess-queue-ell] database. As such, it shares a few common traits with most other NoSQL databases:

It's developer friendly. In other words, it should be rather easy for a developer to learn how to use and deploy it.

Full horizontal scalability, meaning that adding more capacity, whether for handling load or storing more data, is as simple as adding more servers to the cluster.

Redundancy and reliability. This means that even if a server fails, the data will still be accessible and the cluster will be able to reheal to having sufficient copies of a piece of data quickly.

A distributed shared-nothing architecture which is used to allow sfDB5 to scale out efficiently while still maintaining high performance.


Inspiration I: Data Dependent Queries

  • Recursive Querying
  • Read Dependencies

The primary inspiration for sfDB5 is data dependent reads. This occurs when you have data whose access name (ie: key) depends on another piece of data, which depends on yet another piece of data, and so on.

An example of this would be if you used a username to lookup a userID, which you then used to finally lookup the userProfile. This would be three separate, serial fetches from the server. This sort of recursive querying introduces addiional latencies into your application, thus reducing its overall performance.

Read dependencies are a problem where an application requires all of certain data before it can continue its execution. This causes an application to sit and idle while it keeps querying recursively and waiting for data. This can mean quite a few milliseconds of wasted compute time, which can quickly start to add up.


Inspiration II: Ease of Use

  • MongoDB
  • Cassandra
  • Couchbase

Ease of use is critical in making good software. Yet some of the top contenders today lack ease of use.

MongoDB is a bit less than trivial to setup a cluster for. I'm sure you'll have even more fun when you try setting up sharding and replication though.

To fully utilize all of the features and functionality of Cassandra, you need to learn Cassandra Query Language (CQL). Fantastic. Yet another query language to learn. Isn't that what we were trying to get away from to start with?

Now at surface level, Couchbase seems perfect in this regard. Administrating clusters is a trivial task, while the documentation is spectacular. Learning to use it is also really easy. However, you have to manually create and parse data structures that you store in Couchbase.


Inspiration III: Integrated Atomic Operations

  1. Perform Operation
  2. Store Result
  3. Return Result

  • Basic Arithmetic (integers and floats): ++, --, +=, -=, *=, /=, %=
  • Bitwise and Shift (integers only): |=, &=, ^=, <<=, >>=, ROTL, ROTR

The final inspiration for sfDB5 was the ability to do atomic operation on numbers serverside. These are operations where the server would simply perform an opration on a variable, replace its value with the result, and send back a copy of the result.

These can be useful in all sorts of scenarios, from like counters to server pooling. Doing this serverside allows for extremely high performance, as compared to the usual lock, fetch, compute, and store.


Solution I: Recurse Clusterside

  • Model DB Queries closer to memory accesses
  • Execute all accesses within the cluster
  • Utilize a query format similar to C structure/pointer/array format

itemName.member->memberOfPointedStructure[arrayIndex].member

etc. (with a few exceptions)

Recursing clusterside is the best way to solve the first problem.

Model database queries similar to how memory accesses are modelled and execute all accesses within the cluster. This helps performance greatly.

Utilzing a query format similar to C's memory access format allows developers to learn to use sfDB5 quickly, easily, and painlessly.


Solution II: Embrace Minimalism and Simplicity

  • Cluster administration is simple: Add/Remove Servers & Monitoring

Sharding and Balancing is automatic.

  • Zero "Query Langauges" - use standard functions (set, get, del, incr, add, mult, etc.) with C-style variable access syntax
  • No clientside parsing necessary - specific variable are requested from and delivered by the server

To increase ease of use, just embrace minimalism and simplicity.

Cluster administration is simple with sfDB5 - all you do is add servers, remove servers, and monitor the cluster. Sharding and balancing is automatically taken care of by the cluster.

Meanwhile, you don't have to learn any new "query languages". Instead, you just use a syntax similar to C to specify your queries - everything else is handled by choosing functions.

Finally, there is no clientside parsing. You simply request a variable from the server and it is delivered.


Solution III: Serverside Atomic Operations

This one is fairly simple.

Just have Atomic Operations that can be done Serverside.

Wasn't that easy?

This one's pretty simple: just have atomic operations that are performed serverside. Ta-da!

Wasn't that easy?


That's it, folks!

sfDB5 is still in development, but as part of the FOSS philosophy, you can help!

GitHub: smarturl.it/sfDB5

(these) Slides: smarturl.it/sfDB5-talk

sfDB5 is still in development, but you can help! Head on over to the GitHub repository over at [smart you arr ell dot eye tee slash ess eff dee bee five].

Thank you all for your time.