CurioDB: 用 Scala和Akka构建的分布式持久化Redis克隆

jopen 11年前

CurioDB: 用 Scala和Akka构建的分布式持久化Redis克隆。

Installation

I've been using SBT to build the project, which you can install on OSX, Linux or Windows. With that done, you just need to clone this repository and run it:

$ git clone git://github.com/stephenmcd/curiodb.git  $ cd curiodb  $ sbt run

You can also build a binary (executable JAR file):

$ sbt assembly  $ ./target/scala-2.11/curiodb-0.0.1

Overview

Why build a Redis clone? Well, I'd been learning Scala and Akka and wanted a nice project I could take them further with. I've used Redis heavily in the past, and Akka gave me some really cool ideas for implementing a clone, based on each key/value pair (or KV pair) in the system being implemented as an actor:

Concurrency

Since each KV pair in the system is an actor, CurioDB will happily use all your CPU cores, so you can run 1 server using 32 cores instead of 32 servers each using 1 core (or use all 1,024 cores of your 32 server cluster, why not). Each actor operates on its own value atomically, so the atomic nature of Redis commands is still present, it just occurs at the individual KV level instead of in the context of an entire running instance of Redis.

Distributed by Default

Since each KV pair in the system is an actor, the interaction between multiple KV pairs works the same way when they're located across the network as it does when they're located on different processes on a single machine. This negates the need for features of Redis like "hash tagging", and allows commands that deal with multiple keys (SUNION,SINTER,MGET,MSET, etc) to operate seamlessly across a cluster.

Virtual Memory

Since each KV pair in the system is an actor, the unit of disk storage is the individual KV pair, not a single instance's entire data set. This makes Redis' abandoned virtual memory feature a lot more feasible. With CurioDB, an actor can simply persist its value to disk after some criteria occurs, and shut itself down until requested again.

Simple Implementation

Scala is concise, you get a lot done with very little code, but that's just the start - CurioDB leverages Akka very heavily, taking care of clustering, concurrency, persistence, and a whole lot more. This means the bulk of CurioDB's code mostly deals with implementing all of the Redis commands, so far weighing in at only a paltry 1,000 lines of Scala! Currently, the majority of commands have been fully implemented, as well as the Redis wire protocol itself, so existing client libraries can be used. Some commands have been purposely omitted where they don't make sense, such as cluster management, and things specific to Redis' storage format.

Pluggable Storage

Since Akka Persistence is used for storage, many strange scenarios become available. Want to use PostgreSQL or Cassandra for storage, with CurioDB as the front-end interface for Redis commands? This should be possible! By default, CurioDB uses Akka's built-in LevelDB storage.

Design

Here's a bad diagram representing one server in the cluster, and the flow of a client sending a command:

An outside client sends a command to the server actor (Server.scala). There's at most one per cluster node (which could be used to support load balancing), and at least one per cluster (not all nodes need to listen for outside clients).
Upon receiving a new outside client connection, the server actor will create a Client Node actor (System.scala), it's responsible for the life-cycle of a single client connection, as well as parsing the incoming and writing the outgoing Redis wire protocol.
Key Node actors (System.scala) manage the key space for the entire system, which are distributed across the entire cluster using consistent hashing. A Client Node will forward the command to the matching Key Node for its key.
A Key Node is then responsible for creating, removing, and communicating with each KV actor, which are the actual Nodes for each key and value, such as a string, hash, sorted set, etc (Data.scala).
The KV Node then sends a response back to the originating Client Node, which returns it to the outside client.
Some commands require coordination with multiple KV Nodes, in which case a temporary Aggregate actor (Aggregation.scala) is created by the Client Node, which coordinates the results for multiple commands via Key Nodes and KV Nodes in the same way a Client Node does.
PubSub is implemented by adding behavior to Key Nodes and Client Nodes, which act as PubSub servers and clients respectively (PubSub.scala).

项目主页：http://www.open-open.com/lib/view/home/1436778542084