Apache Hive v2.1.0 发布
jopen 8年前
<p style="text-align: center;"><img alt="" src="https://simg.open-open.com/show/988125c04f1b57cf3a5f7ea76d4aa4b2.png" /></p> <p>Hive是一个基于Hadoop的开源数据仓库,用于存储和处理海量结构化数据。它是非死book 2008年8月开源的一个数据仓库框架,提供了类似于SQL语法的HQL语句作为数据访问接口,Hive有如下优缺点:</p> <p>优点:</p> <ul> <li>Hive 使用类SQL 查询语法, 最大限度的实现了和SQL标准的兼容,大大降低了传统数据分析人员学习的曲线;</li> <li>使用JDBC 接口/ODBC接口,开发人员更易开发应用;</li> <li>以MR 作为计算引擎、HDFS 作为存储系统,为超大数据集设计的计算/ 扩展能力;</li> <li>统一的元数据管理(Derby、MySql等),并可与Pig 、Presto 等共享;</li> </ul> <p>缺点:</p> <ul> <li>Hive 的HQL 表达的能力有限,有些复杂运算用HQL 不易表达;</li> <li>由于Hive自动生成MapReduce 作业, HQL 调优困难;</li> <li>粒度较粗,可控性差</li> </ul> <p style="text-align: center;"><img alt="" src="https://simg.open-open.com/show/ef4c85fc0e2ee7dd847b329ae738878e.jpg" /></p> <p style="text-align: center;"><strong>Hive运行架构</strong></p> <h2>更新日志</h2> <ul> <li>[<a href="/misc/goto?guid=4958991613085647222">HIVE-9774</a>] - Print yarn application id to console [Spark Branch]</li> <li>[<a href="/misc/goto?guid=4958991613207583310">HIVE-10280</a>] - LLAP: Handle errors while sending source state updates to the daemons</li> <li>[<a href="/misc/goto?guid=4958991613305639049">HIVE-11107</a>] - Support for Performance regression test suite with TPCDS</li> <li>[<a href="/misc/goto?guid=4958991613412902813">HIVE-11417</a>] - Create shims for the row by row read path that is backed by VectorizedRowBatch</li> <li>[<a href="/misc/goto?guid=4958991613524099982">HIVE-11526</a>] - LLAP: implement LLAP UI as a separate service - part 1</li> <li>[<a href="/misc/goto?guid=4958991613627712532">HIVE-11766</a>] - LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal</li> <li>[<a href="/misc/goto?guid=4958991613730611964">HIVE-11927</a>] - Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants</li> <li>[<a href="/misc/goto?guid=4958991613840427087">HIVE-12049</a>] - HiveServer2: Provide an option to write serialized thrift objects in final tasks</li> </ul> <h3><strong>Bug修复</strong></h3> <ul> <li>[<a href="/misc/goto?guid=4958991613942799598">HIVE-1608</a>] - use sequencefile as the default for storing intermediate results</li> <li>[<a href="/misc/goto?guid=4958991614054392596">HIVE-4662</a>] - first_value can't have more than one order by column</li> <li>[<a href="/misc/goto?guid=4958991614171380121">HIVE-8343</a>] - Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner</li> <li>[<a href="/misc/goto?guid=4958991614267892862">HIVE-9144</a>] - Beeline + Kerberos shouldn't prompt for unused username + password</li> <li>[<a href="/misc/goto?guid=4958991614369683496">HIVE-9457</a>] - Fix obsolete parameter name in HiveConf description of hive.hashtable.initialCapacity</li> <li>[<a href="/misc/goto?guid=4958990847963109386">HIVE-9499</a>] - hive.limit.query.max.table.partition makes queries fail on non-partitioned tables</li> <li>[<a href="/misc/goto?guid=4958991614512830297">HIVE-9534</a>] - incorrect result set for query that projects a windowed aggregate</li> <li>[<a href="/misc/goto?guid=4958990848091035255">HIVE-9862</a>] - Vectorized execution corrupts timestamp values</li> <li>[<a href="/misc/goto?guid=4958991614656092950">HIVE-10171</a>] - Create a storage-api module</li> <li>[<a href="/misc/goto?guid=4958991614764650607">HIVE-10187</a>] - Avro backed tables don't handle cyclical or recursive records</li> <li>[<a href="/misc/goto?guid=4958991614865390103">HIVE-10632</a>] - Make sure TXN_COMPONENTS gets cleaned up if table is dropped before compaction.</li> <li>[<a href="/misc/goto?guid=4958990848206079659">HIVE-10729</a>] - Query failed when select complex columns from joinned table (tez map join only)</li> <li>[<a href="/misc/goto?guid=4958991615000193042">HIVE-11097</a>] - HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases</li> <li>[<a href="/misc/goto?guid=4958991615101478355">HIVE-11388</a>] - Allow ACID Compactor components to run in multiple metastores</li> <li>[<a href="/misc/goto?guid=4958991615214382272">HIVE-11427</a>] - Location of temporary table for CREATE TABLE SELECT broken by HIVE-7079</li> <li>[<a href="/misc/goto?guid=4958991615308201635">HIVE-11484</a>] - Fix ObjectInspector for Char and VarChar</li> <li>[<a href="/misc/goto?guid=4958991615420131242">HIVE-11550</a>] - ACID queries pollute HiveConf</li> <li>[<a href="/misc/goto?guid=4958991615529249637">HIVE-11675</a>] - make use of file footer PPD API in ETL strategy or separate strategy</li> <li>[<a href="/misc/goto?guid=4958991615633428076">HIVE-11716</a>] - Reading ACID table from non-acid session should raise an error</li> <li>[<a href="/misc/goto?guid=4958991615730979592">HIVE-11806</a>] - Create test for HIVE-11174</li> </ul> <h3><strong>功能改进</strong></h3> <ul> <li>[<a href="/misc/goto?guid=4958991615830122372">HIVE-4570</a>] - More information to user on GetOperationStatus in Hive Server2 when query is still executing</li> <li>[<a href="/misc/goto?guid=4958991615943503415">HIVE-4924</a>] - JDBC: Support query timeout for jdbc</li> <li>[<a href="/misc/goto?guid=4958991616046379307">HIVE-5370</a>] - format_number udf should take user specifed format as argument</li> <li>[<a href="/misc/goto?guid=4958991616151033630">HIVE-6535</a>] - JDBC: provide an async API to execute query and fetch results</li> <li>[<a href="/misc/goto?guid=4958990855553025641">HIVE-10115</a>] - HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled</li> <li>[<a href="/misc/goto?guid=4958991616290248563">HIVE-10249</a>] - ACID: show locks should show who the lock is waiting for</li> <li>[<a href="/misc/goto?guid=4958991616393190520">HIVE-10468</a>] - Create scripts to do metastore upgrade tests on jenkins for Oracle DB.</li> <li>[<a href="/misc/goto?guid=4958991616493678739">HIVE-10982</a>] - Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver</li> <li>[<a href="/misc/goto?guid=4958991616601856119">HIVE-11424</a>] - Rule to transform OR clauses into IN clauses in CBO</li> <li>[<a href="/misc/goto?guid=4958991616700654493">HIVE-11483</a>] - Add encoding and decoding for query string config</li> <li>[<a href="/misc/goto?guid=4958991616808394623">HIVE-11487</a>] - Add getNumPartitionsByFilter api in metastore api</li> <li>[<a href="/misc/goto?guid=4958991616914272517">HIVE-11752</a>] - Pre-materializing complex CTE queries</li> <li>[<a href="/misc/goto?guid=4958991617026227534">HIVE-11793</a>] - SHOW LOCKS with DbTxnManager ignores filter options</li> <li>[<a href="/misc/goto?guid=4958991617119738012">HIVE-11956</a>] - SHOW LOCKS should indicate what acquired the lock</li> <li>[<a href="/misc/goto?guid=4958991617230937249">HIVE-12431</a>] - Support timeout for compile lock</li> <li>[<a href="/misc/goto?guid=4958991617344015820">HIVE-12439</a>] - CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements</li> <li>[<a href="/misc/goto?guid=4958991617461931198">HIVE-12467</a>] - Add number of dynamic partitions to error message</li> <li>[<a href="/misc/goto?guid=4958991617556465186">HIVE-12481</a>] - Occasionally "Request is a replay" will be thrown from HS2</li> <li>[<a href="/misc/goto?guid=4958991617664155466">HIVE-12515</a>] - Clean the SparkCounters related code after remove counter based stats collection[Spark Branch]</li> <li>[<a href="/misc/goto?guid=4958991617769084414">HIVE-12541</a>] - SymbolicTextInputFormat should supports the path with regex</li> <li>[<a href="/misc/goto?guid=4958991617878217857">HIVE-12545</a>] - Add sessionId and queryId logging support for methods like getCatalogs in HiveSessionImpl class</li> <li>[<a href="/misc/goto?guid=4958991617975466124">HIVE-12595</a>] - [REFACTOR] Make physical compiler more type safe</li> </ul> <h3><img src="file:///C:/Users/wqm/AppData/Local/Temp/enhtmlclip/Image(8).jpg" />新功能</h3> <ul> <li>[<a href="/misc/goto?guid=4958991618087776395">HIVE-12270</a>] - Add DBTokenStore support to HS2 delegation token</li> <li>[<a href="/misc/goto?guid=4958991618193017861">HIVE-12634</a>] - Add command to kill an ACID transaction</li> <li>[<a href="/misc/goto?guid=4958991618304059892">HIVE-12730</a>] - MetadataUpdater: provide a mechanism to edit the basic statistics of a table (or a partition)</li> <li>[<a href="/misc/goto?guid=4958991618406277304">HIVE-12878</a>] - Support Vectorization for TEXTFILE and other formats</li> <li>[<a href="/misc/goto?guid=4958991618496468876">HIVE-12994</a>] - Implement support for NULLS FIRST/NULLS LAST</li> <li>[<a href="/misc/goto?guid=4958991618607431719">HIVE-13029</a>] - NVDIMM support for LLAP Cache</li> <li>[<a href="/misc/goto?guid=4958991618705215858">HIVE-13095</a>] - Support view column authorization</li> <li>[<a href="/misc/goto?guid=4958991618818948114">HIVE-13125</a>] - Support masking and filtering of rows/columns</li> <li>[<a href="/misc/goto?guid=4958991618914276520">HIVE-13307</a>] - LLAP: Slider package should contain permanent functions</li> <li>[<a href="/misc/goto?guid=4958991619022526477">HIVE-13418</a>] - HiveServer2 HTTP mode should support X-Forwarded-Host header for authorization/audits</li> <li>[<a href="/misc/goto?guid=4958991619118323294">HIVE-13475</a>] - Allow aggregate functions in over clause</li> <li>[<a href="/misc/goto?guid=4958991619222373952">HIVE-13736</a>] - View's input/output formats are TEXT by default</li> </ul> <p>更多日志见:<a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334255&styleName=Text&projectId=12310843">ReleaseNote</a></p> <h2>下载</h2> <ul> <li><a href="/misc/goto?guid=4958991619462630781" rel="nofollow"><strong>Source code</strong> (zip)</a></li> <li><a href="/misc/goto?guid=4958991619576253896" rel="nofollow"><strong>Source code</strong> (tar.gz)</a></li> </ul> <p> </p>