一个快速,高效的批处理计算引擎:Cubert
Cubert是一个快速,高效的批处理计算引擎,用于对Hadoop的大规模数据集进行复杂的分析和报告。
Cubert非常适合以下应用领域:
-
统计计算,联接和聚合。Statistical Calculations, Joins and Aggregations
Cubert introduces a new model of computation that allows users to organize data in a format that is ideally suited for scalable execution of subsequent query processing operators, and a set of algorithmically-efficient operators (MeshJoin and CUBE) that exploit the organization to provide significantly improved CPU and resource utilization compared to existing solutions.
-
多维数据集和分组集聚合。Cubes and Grouping Set Aggregations
The power-horse is the new CUBE operator that can efficiently (CPU and memory) compute additive, non-additive (e.g. Count Distinct) and exact percentile rank (e.g. Median) statistics; can roll up inner dimensions on-the-fly and compute multiple measures within a single job.
-
时间范围计算和增量计算。Time range calculation and Incremental computations
Cubert primitives are specially suited for reporting workflows that employ computation pattern that is both regular and repetitive, allowing for efficiency gains from partial result caching and incremental processing.
-
图形计算。Graph computations
Cubert provides a novel sparse matrix multiplication algorithm that is best suited for analytics with large-scale graphs.
-
在性能还是资源是值得关注的问题。When performance or resources are a matter of concern
Cubert Script is a developer-friendly language that takes out the hints, guesswork and surprises when running the script. The script provides the developers complete control over the execution plan (without resorting to low-level programming!), and is extremely extensible by adding new functions, aggregators and even operators.