PostgreSQL 柱状存储扩展:cstore_fdw

jopen 11年前

cstore_fdw 实现了 PostgreSQL 数据库的柱状存储,用于对批量加载的数据进行分析的场景。

该扩展使用了 Optimized Row Columnar (ORC) 格式的数据存储布局。ORC 提升 非死book 开发的 RCFile 格式,带来如下好处:

  • 压缩: Reduces in-memory and on-disk data size by 2-4x. Can be extended to support different codecs.

  • 列预测: Only reads column data relevant to the query. Improves performance for I/O bound queries.

  • 跳过索引: Stores min/max statistics for row groups, and uses them to skip over unrelated rows.

此外,我们使用了 PostgreS 外部数据封装 API 和类型呈现,带来:

  • Support for 40+ Postgres data types. The user can also create new types and use them.

  • Statistics collection. PostgreSQL's query optimizer uses these stats to evaluate different query plans and pick the best one.

  • Simple setup. Create foreign table and copy data. Run SQL.

Highlights

Key areas improved by this extension:

  • Faster Analytics — Reduce analytics query disk and memory use by 10x
  • Lower Storage — Compress data by 3x
  • Easy Setup — Deploy as standard PostgreSQL extension
  • Flexibility — Mix row- and column-based tables in the same DB
  • Community — Benefit from PostgreSQL compatibility and open development

Learn more on our blog post.

Faster Analytics

cstore_fdw brings substantial performance benefits to analytics-heavy workloads:

  • Column projections: only read columns relevant to the query
  • Compressed data: higher data density reduces disk I/O
  • Skip indexes: row group stats permit skipping irrelevant rows
  • Stats collections: integrates with PostgreSQL’s own query optimizer
  • PostgreSQL-native formats: no deserialization overhead at query time

qq截图20140404083637.png

    PostgreSQL 柱状存储扩展:cstore_fdwDisk I/O (MiB)I/O Utilization4GB data using PostgreSQL 9.3 on m1.xlargePostgreSQLcstorecstore (LZ)TPC-H 3TPC-H 5TPC-H 6TPC-H 100k1k2k3k4k5kHighcharts.com

    Lower Storage

    Cleanly implements full-table compression:

    Easy Setup

    If you know how to use PostgreSQL extensions, you know how to use cstore_fdw:

    • Deploy as standard PostgreSQL extension
    • Simply specify table type at creation time using FDW commands
    • Copy data into your tables using standard PostgreSQL COPY command

    Flexibility

    Have the best of all worlds… mix row- and column-based tables in the same DB:

    CREATE FOREIGN TABLE cstore_table    (num integer, name text)  SERVER cstore_server  OPTIONS (filename '/var/tmp/testing.cstore');    CREATE TABLE plain_table    (num integer, name text);    COPY cstore_table FROM STDIN (FORMAT csv);  -- 1, foo  -- 2, bar  -- 3, baz  -- \.    COPY plain_table FROM STDIN (FORMAT csv);  -- 4, foo  -- 5, bar  -- 6, baz  -- \.    SELECT * FROM cstore_table c, plain_table p WHERE c.name=p.name;  -- num | name | num | name   -------+------+-----+------  --   1 |  foo |   4 |  foo  --   2 |  bar |   5 |  bar  --   3 |  baz |   6 |  baz

    项目主页:http://www.open-open.com/lib/view/home/1396571870215