Java全文搜索框架,Lucene 5.3.0 发布
Lucene 是 apache 软件基金会一个开放源代码的全文检索引擎工具包,是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎。 Lucene 的目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎。
Lucene 最初是由 Doug Cutting 所撰写的,是一位资深全文索引/检索专家,曾经是V-Twin 搜索引擎的主要开发者,后来在 Excite 担任高级系统架构设计师,目前从事于一些 INTERNET 底层架构的研究。他贡献出 Lucene 的目标是为各种中小型应用程式加入全文检索功能。
Lucene 5.3.0 发布,此版本值得关注的更新如下:API 改进:PhraseQuery 和 BooleanQuery 不可变
Added a new class that can be used to validate that an index has an appropriate structure to run join queries
Added a new BlendedTermQuery to blend statistics across several terms
New common suggest API that mirrors Lucene's Query/IndexSearcher APIs for Document based suggester.
IndexWriter can now be initialized from an already open near-real-time or non-NRT reader
Add experimental range tree doc values format and queries, based on a 1D version of the spatial BKD tree, for a faster and smaller alternative to postings-based numeric and binary term filtering. Range trees can also handle values larger than 64 bits.
Geo 相关特性和改进:
Added GeoPointField, GeoPointInBBoxQuery, GeoPointInPolygonQuery for simple "indexed lat/lon point in bbox/shape" searching
Added experimental BKD geospatial tree doc values format and queries, for fast "bbox/polygon contains lat/lon points"
Use doc values to post-filter GeoPointField hits that fall in boundary cells, resulting in smaller index, faster searches and less heap used for each query
Reduce RAM usage of FieldInfos, and speed up lookup by number, by using an array instead of TreeMap except in very sparse cases
Faster intersection of the terms dictionary with very finite automata, which can be generated eg. by simple regexp queries
Various bugfixes and optimizations since the 5.2.0 release.