Hadoop和Hadoop的生态系统资源列表:Awesome Hadoop
jopen
10年前
Awesome Hadoop
Hadoop和Hadoop的生态系统资源列表。类似的还有 Awesome PHP, Awesome Python and Awesome Sysadmin
- Awesome Hadoop </li>
- Resources
- Websites
- Presentations
- Books </ul> </li>
- Other Awesome Lists </ul>
- Apache Hadoop - Apache Hadoop
- Apache Tez
- SpatialHadoop - SpatialHadoop is a MapReduce extension to Apache Hadoop designed specially to work with spatial data.
- GIS Tools for Hadoop - Big Data Spatial Analytics for the Hadoop Framework
- Elasticsearch Hadoop - Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive and Apache Pig.
- dumbo - Python module that allows you to easily write and run Hadoop programs.
- hadoopy - Python MapReduce library written in Cython.
- mrjob - mrjob is a Python 2.5+ package that helps you write and run Hadoop Streaming jobs.
- pydoop - Pydoop is a package that provides a Python API for Hadoop.
- hdfs-du - HDFS-DU is an interactive visualization of the Hadoop distributed file system.
- White Elephant - Hadoop log aggregator and dashboard
- Kiji Project
- Genie - Genie provides REST-ful APIs to run Hadoop, Hive and Pig jobs, and to manage multiple Hadoop resources and perform job submissions across them.
- Apache Twill
- mpich2-yarn - Running MPICH2 on Yarn
- Apache HBase - Apache HBase
- Apache Phoenix - A SQL skin over HBase
- happybase - A developer-friendly Python library to interact with Apache HBase.
- Hannibal - Hannibal is tool to help monitor and maintain HBase-Clusters that are configured for manual splitting.
- Haeinsa - Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase
- hindex - Secondary Index for HBase
- Apache Accumulo - The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.
- OpenTSDB - The Scalable Time Series Database
- Apache Cassandra
- Apache Hive
- Cloudera Impala
- Presto
- Apache Tajo
-
Hive Plugins
- UDF
- http://nexr.github.io/hive-udf/
- https://github.com/edwardcapriolo/hive_cassandra_udfs
- https://github.com/livingsocial/HiveSwarm
- https://github.com/ThinkBigAnalytics/Hive-Extensions-from-Think-Big-Analytics
- https://github.com/karthkk/udfs
- https://github.com/kevinweil/elephant-bird - 推ter
- https://github.com/lovelysystems/ls-hive
- https://github.com/stewi2/hive-udfs
- https://github.com/klout/brickhouse
- https://github.com/markgrover/hive-translate (PostgreSQL translate())
- https://github.com/deanwampler/HiveUDFs
- https://github.com/myui/hivemall (Machine Learning UDF/UDAF/UDTF)
- https://github.com/edwardcapriolo/hive-geoip (GeoIP UDF)
- Storage Handler
- https://github.com/dvasilen/Hive-Cassandra
- https://github.com/yc-huang/Hive-mongo
- https://github.com/balshor/gdata-storagehandler
- https://github.com/karthkk/hive-hbase-json
- https://github.com/sunsuk7tp/hive-hbase-integration
- https://bitbucket.org/rodrigopr/redisstoragehandler
- https://github.com/zhuguangbin/HiveJDBCStorageHanlder
- https://github.com/chimpler/hive-solr
- https://github.com/bfemiano/accumulo-hive-storage-manager </ul> </li>
- SerDe
- https://github.com/rcongiu/Hive-JSON-Serde
- https://github.com/mochi/hive-json-serde
- https://github.com/ogrodnek/csv-serde
- https://github.com/parag/HiveJsonSerde
- https://github.com/johanoskarsson/hive-json-serde
- https://github.com/electrum/hive-serde - JSON
- https://github.com/karthkk/hive-hbase-json </ul> </li>
- Libraries and tools
- https://github.com/forward/rbhive
- https://github.com/synctree/activerecord-hive-adapter
- https://github.com/hrp/sequel-hive-adapter
- https://github.com/forward/node-hive
- https://github.com/recruitcojp/WebHive
- shib - WebUI for query engines: Hive and Presto
- clive - Clojure library for interacting with Hive via Thrift
- http://www.phphiveadmin.net/
- https://github.com/anjuke/hwi
- https://code.google.com/a/apache-extras.org/p/hipy/
- https://github.com/dmorel/Thrift-API-HiveClient2 (Perl - HiveServer2)
- PyHive - Python interface to Hive and Presto
- https://github.com/recruitcojp/OdbcHive
- Hive-Sharp
- HiveRunner - An Open Source unit test framework for hadoop hive queries based on JUnit4
- Beetest - A super simple utility for testing Apache Hive scripts locally for non-Java developers.
- Hive_test- Unit test framework for hive and hive-service </ul> </li> </ul> </li>
- </ul>
- Apache Oozie - Apache Oozie
- Azkaban
- Apache Falcon - Data management and processing platform
- Apache Flume - Apache Flume
- Apache Sqoop - Apache Sqoop
- Apache Kafka - Apache Kafka
- Flume Plugins
- Flume MongoDB Sink
- Flume HornetQ Channel
- Flume MessagePack Source
- Flume RabbitMQ source and sink
- Flume UDP Source
- Stratio Ingestion - Custom sinks: Cassandra, MongoDB, Stratio Streaming and JDBC
- Flume Custom Serializers
- Real-time analytics in Apache Flume
- .Net FlumeNG Clients
- Suro - Netflix's distributed Data Pipeline </ul>
- Apache Pig - Apache Pig
- Apache DataFu - A collection of libraries for working with large-scale data in Hadoop
- vahara - Machine learning and natural language processing with Apache Pig
- packetpig - Open Source Big Data Security Analytics
- akela - Mozilla's utility library for Hadoop, HBase, Pig, etc.
- seqpig - Simple and scalable scripting for large sequencing data set(ex: bioinfomation) in Hadoop
- Lipstick - Pig workflow visualization tool. Introducing Lipstick on A(pache) Pig
- PigPen - PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
- Kite Software Development Kit - A set of libraries, tools, examples, and documentation
- gohadoop - Native go clients for Apache Hadoop YARN.
- Hue - A Web interface for analyzing data with Apache Hadoop.
- Zeppelin
- Jumbune - Jumbune is an open-source product built for analyzing Hadoop cluster and MapReduce jobs.
- Apache Thrift
- Apache Avro - Apache Avro is a data serialization system.
- Elephant Bird - 推ter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
- Spring for Apache Hadoop
- Apache Spark
- Apache Crunch
- Cascading - Cascading is the proven application development platform for building data applications on Hadoop.
- Apache Flink - Apache Flink is a platform for efficient, distributed, general-purpose data processing.
- Apache Bigtop - Apache Bigtop: Packaging and tests of the Apache Hadoop ecosystem
- Apache Ambari - Apache Ambari
- Ganglia Monitoring System
- ankush - A big data cluster management tool that creates and manages clusters of different technologies.
- Apache Zookeeper - Apache Zookeeper
- Apache Curator - ZooKeeper client wrapper and rich ZooKeeper framework
- Buildoop - Hadoop Ecosystem Builder
- Deploop - The Hadoop Deploy System
- ElasticSearch
- Apache Solr
- SenseiDB - Open-source, distributed, realtime, semi-structured database
- Big Data Benchmark
- HiBench
- Big-Bench
- hive-benchmarks
- hive-testbench - Testbench for experimenting with Apache Hive at any data scale.
- Apache Maout
- Cloudera Oryx - The Oryx open source project provides simple, real-time large-scale machine learning / predictive analytics infrastructure.
- MLlib - MLlib is Apache Spark's scalable machine learning library.
- R - R is a free software environment for statistical computing and graphics.
- RHive - RHive is an R extension facilitating distributed computing via Apache Hive.
- RHadoop
- Hadoop Weekly
- The Hadoop Ecosystem Table
- Hadoop 1.x vs 2
- Apache Hadoop YARN: Yet Another Resource Negotiator
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN - Background and an Overview
- Apache Hadoop YARN - Concepts and Applications
- Apache Hadoop YARN - ResourceManager
- Apache Hadoop YARN - NodeManager
- Migrating to MapReduce 2 on YARN (For Users)
- Migrating to MapReduce 2 on YARN (For Operators)
- Hadoop and Big Data: Use Cases at Salesforce.com
- All you wanted to know about Hadoop, but were too afraid to ask: genealogy of elephants.
- What is Bigtop, and Why Should You Care?
- Hadoop - Distributions and Commercial Support
- Ganglia configuration for a small Hadoop cluster and some troubleshooting
- Hadoop illuminated - Open Source Hadoop Book
- NoSQL Database
- 10 Best Practices for Apache Hive
- Hadoop Operations at Scale
- Hadoop 24/7
- An example Apache Hadoop Yarn upgrade
- Apache Hadoop In Theory And Practice
- Hadoop Operations at LinkedIn
- Hadoop Performance at LinkedIn
- Hadoop: The Definitive Guide
- Hadoop Operations
- Apache Hadoop Yarn
- HBase: The Definitive Guide
- Programming Pig
- Programming Hive
Workflow, Lifecycle and Governance
Data Ingestion and Integration
DSL
**
Libraries and Tools
**
Realtime Data Processing
Distributed Computing and Programming
Packaging, Provisioning and Monitoring
Search
Benchmark
**
Machine learning and Big Data analytics
Misc.
Resources
Various resources, such as books, websites and articles.
Websites
Useful websites and articles
Presentations
Books
Other Awesome Lists
Other amazingly awesome lists can be found in the awesome-awesomeness list.
- UDF
Hadoop
YARN
NoSQL
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.
SQL on Hadoop
SQL on Hadoop