Lucene3.6 之 查询篇
1、BooleanQuery
lucene3.6中BooleanQuery 实现与或的复合搜索
BooleanClause用于表示布尔查询子句关系的类,包括:BooleanClause.Occur.MUST,BooleanClause.Occur.MUST_NOT,BooleanClause.Occur.SHOULD。必须包含,不能包含,可以包含三种.有以下6种组合:
1.MUST和MUST:取得连个查询子句的交集。
2.MUST和MUST_NOT:表示查询结果中不能包含MUST_NOT所对应得查询子句的检索结果。
3.SHOULD与MUST_NOT:连用时,功能同MUST和MUST_NOT。
4.SHOULD与MUST连用时,结果为MUST子句的检索结果,但是SHOULD可影响排序。
5.SHOULD与SHOULD:表示“或”关系,最终检索结果为所有检索子句的并集。
6.MUST_NOT和MUST_NOT:无意义,检索无结果。
示例代码
public static void query(String path,String keyword,int size){ try { File file = new File(path); Directory mdDirectory = FSDirectory.open(file); Analyzer analyzer = new IKAnalyzer(); // Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String[] fieldName = { "title", "category" }; // (在多个Filed中搜索) QueryParser queryParser = new MultiFieldQueryParser( Version.LUCENE_36, fieldName, analyzer); Query q1 = queryParser.parse(keyword); QueryParser parser = new QueryParser(Version.LUCENE_36, "author", analyzer); Query q2 = parser.parse("周伟明"); BooleanQuery boolQuery = new BooleanQuery(); boolQuery.add(q1, BooleanClause.Occur.MUST); boolQuery.add(q2,BooleanClause.Occur.MUST); ScoreDoc[] docs = searcher.search(boolQuery, null, size).scoreDocs; for (int i = 0; docs != null && i < docs.length; i++) { Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); Book book = new Book(); book.setId(id); book.setTitle(title); book.setAuthor(author); book.setPublishTime(publishTime); book.setSource(source); book.setCategory(category); book.setReputation(reputation); System.out.println(book); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } }
2、TermQuery
词条查询,通过对某个词条的指定,实现检索索引中存在该词条的所有文档。
@Test public void testTermQuery(){ try { String path = "D://LuceneEx/day02"; String keyword = "android"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); TermQuery query = new TermQuery(new Term("title", keyword)); TopDocs tops = searcher.search(query, null, 50); int count = tops.totalHits; System.out.println("totalHits=" + count); ScoreDoc[] docs = tops.scoreDocs; for (int i = 0; i < docs.length; i++) { Document doc = searcher.doc(docs[i].doc); float score = docs[i].score; int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id + "\t" + title + "\t" + author + "\t" + publishTime + "\t" + source + "\t" + category + "\t" + reputation+"\t"+score); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
3、TermRangeQuery
范围查询,这种范围可以是日期,时间,数字,大小等等。可以使用"context:[a to b]"(包含边界)或者"content:{a to b}"(不包含边界) 查询表达式
示例代码
@Test public void testTermRangeQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "publishTime"; //查询出版日期在 "2011-04" 到 "2011-07" 之间的书籍 TermRangeQuery tq = new TermRangeQuery(fieldName, "2011-04", "2011-07", false, true); TopDocs tops = searcher.search(tq, null, 10); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
4、PrefixQuery
搜索以指定字符串开头的项的文档。当查询表达式中的短语以"*"结尾时,QueryParser的parse函数会为查询表达式项创建一个PrefixQuery对象。
示例代码
@Test public void testPrefixQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "source"; Term prefix = new Term(fieldName, "清华大学"); PrefixQuery preq = new PrefixQuery(prefix ); TopDocs tops = searcher.search(preq, null, 10); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
5、PhraseQuery
短语查询,默认为完全匹配,但可以指定坡度(Slop,默认为0)改变范围。比如Slop=1,检索短语为“电台”,那么在“电台”中间有一个字的也可以被查找出来,比如“电视台”。 查询表达式可以为“电 台 ~1”
示例代码
@Test public void testPhraseQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "title"; PhraseQuery query = new PhraseQuery(); query.add(new Term(fieldName,"Lucene")); query.add(new Term(fieldName,"入门")); // query.setSlop(1); TopDocs tops = searcher.search(query, null, 50); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
6、FuzzyQuery
模糊查询使用的匹配算法是levensh-itein算法。此算法在比较两个字符串时,将动作分为3种:加一个字母(Insert),删一个字母(Delete),改变一个字母(Substitute)。 编辑距离能够影响结果的得分,编辑距离越小得分越高.查询表达式为"fuzzy~",使用~来表示模糊查询。
示例代码
@Test public void testFuzzyQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "category"; Term term = new Term(fieldName, "云计算"); FuzzyQuery query = new FuzzyQuery(term, 0.1f); // FuzzyQuery query = new FuzzyQuery(term, 0.1f,1); TopDocs tops = searcher.search(query, null, 50); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source+" "+category); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
7、WildcardQuery
通配符查询,“*”号表示0到多个字符,“?”表示单个字符。 最好不要用通配符为首,否则会遍历所有索引项
@Test public void testWildcardQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "title"; Term term = new Term(fieldName, "lucene*"); WildcardQuery query = new WildcardQuery(term); TopDocs tops = searcher.search(query, null, 100); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+" "+title+" "+author+" "+publishTime+" "+source+" "+category); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
8、SpanQuery
SpanQuery:跨度查询。此类为抽象类。
SpanTermQuery:检索效果完全同TermQuery,但内部会记录一些位置信息,供SpanQuery的其它API使用,是其它属于SpanQuery的Query的基础。
SpanFirstQuery:查找方式为从Field的内容起始位置开始,在一个固定的宽度内查找所指定的词条。
SpanNearQuery:功能类似PharaseQuery,SpanNearQuery查找所匹配的不一定是短语,还有可能是另一个SpanQuery的查询结果作为整体考虑,进行嵌套查询。
SpanOrQuery:把所有SpanQuery查询结果综合起来,作为检索结果。
SpanNotQuery:从第一个SpanQuery查询结果中,去掉第二个SpanQuery查询结果,作为检索结果。
示例代码
@Test public void testSpanQuery(){ try { String path = "D://LuceneEx/day01"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String fieldName = "title"; Term t1=new Term(fieldName,"权威"); Term t2=new Term(fieldName,"lucene"); Term t3=new Term(fieldName,"搜索"); Term t4=new Term(fieldName,"出版社"); SpanTermQuery q1=new SpanTermQuery(t1); SpanTermQuery q2=new SpanTermQuery(t2); SpanTermQuery q3=new SpanTermQuery(t3); SpanTermQuery q4=new SpanTermQuery(t4); SpanNearQuery query1=new SpanNearQuery(new SpanQuery[]{q1,q2},1,false); SpanNearQuery query2=new SpanNearQuery(new SpanQuery[]{q3,q4},3,false); SpanNotQuery query = new SpanNotQuery(query1, query2); // Term t =new Term("content","mary"); // SpanTermQuery people = new SpanTermQuery(t); // SpanFirstQuery query = new SpanFirstQuery(people,3);//3是宽度 TopDocs tops = searcher.search(query, null, 100); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source");来自:http://blog.csdn.net/fx_sky/article/details/8543146