分布式流处理框架:Apache Samza
jopen
11年前
Apache Samza 是一个分布式流处理框架。它使用 Apache Kafka 用于消息发送,采用 Apache Hadoop YARN 来提供容错,处理器隔离,安全性和资源管理。专用于实时数据的处理,非常像推ter的流处理系统Storm。它具有以下特性:
- Simple API: Unlike most low-level messaging system APIs, Samza provides a very simple call-back based "process message" API that should be familiar to anyone who's used Map/Reduce.
- Managed state: Samza manages snapshotting and restoration of a stream processor's state. Samza will restore a stream processor's state to a snapshot consistent with the processor's last read messages when the processor is restarted.
- Fault tolerance: Samza will work with YARN to restart your stream processor if there is a machine or processor failure.
- Durability: Samza uses Kafka to guarantee that messages will be processed in the order they were written to a partition, and that no messages will ever be lost.
- Scalability: Samza is partitioned and distributed at every level. Kafka provides ordered, partitioned, re-playable, fault-tolerant streams. YARN provides a distributed environment for Samza containers to run in.
- Pluggable: Though Samza works out of the box with Kafka and YARN, Samza provides a pluggable API that lets you run Samza with other messaging systems and execution environments.
- Processor isolation: Samza works with Apache YARN, which supports processor security through Hadoop's security model, and resource isolation through Linux CGroups.