Lucene3.6 之 排序篇
Lucene的默认排序是按照Document的得分进行排序的。当检索结果集中的两个Document的具有相同的得分时,默认按照Document的ID对结果进行排序。
一、使用Sort、SortField类实现排序
Lucene在查询的时候,可以通过以一个Sort作为参数构造一个检索器IndexSearcher,在构造Sort的时候,指定排序规则。 调用sIndexSearcher.search,例如:
IndexSearcher.search(query, filter, n, sort);
关于Sort类,在其内部定义了3种构造方法:
关于SortField类,其构造方法方法如下:
其中type对应的取值如下:
SortField. SCORE 按积分排序
SortField. DOC 按文档排序
SortField. AUTO 域的值为int、long、float都有效
SortField.STRING 域按STRING排序
SortField..FLOAT
SortField.LONG
SortField.DOUBLE
SortField.SHORT
SortField.CUSTOM 通过比较器排序
SortField.BYTE
示例代码
1、对单个字段进行排序
@Test public void sortSingleField(){ try { String path = "D://LuceneEx/day01"; String keyword = "android"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); // Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); // 使用 商业分词器 Analyzer mAnalyzer = new IKAnalyzer(); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String[] fields = {"title","category"}; // (在多个Filed中搜索) QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, mAnalyzer); // String fieldName = "source"; // QueryParser parser = new QueryParser(Version.LUCENE_36, fieldName, mAnalyzer); Query query = parser.parse(keyword); SortField field = new SortField("reputation", SortField.FLOAT); Sort sort = new Sort(field ); TopDocs tops = searcher.search(query, 50, sort ); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+"\t"+title+"\t"+author+"\t"+publishTime+"\t"+source+"\t"+category+"\t"+reputation); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } }
2、对多个字段进行排序
@Test public void sortMultiField(){ try { String path = "D://LuceneEx/day01"; String keyword = "Android"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); // Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36); // 使用 商业分词器 Analyzer mAnalyzer = new IKAnalyzer(); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String[] fields = {"title","category"}; // (在多个Filed中搜索) QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, mAnalyzer); // String fieldName = "source"; // QueryParser parser = new QueryParser(Version.LUCENE_36, fieldName, mAnalyzer); Query query = parser.parse(keyword); SortField sortF1 =new SortField("reputation", SortField.FLOAT); SortField sortF2 =new SortField("source", SortField.STRING); Sort sort =new Sort(new SortField[]{sortF1 , sortF2}); TopDocs tops = searcher.search(query, null, 100, sort); int count = tops.totalHits; System.out.println("totalHits="+count); ScoreDoc[] docs = tops.scoreDocs; for(int i=0;i<docs.length;i++){ Document doc = searcher.doc(docs[i].doc); int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id+"\t"+title+"\t"+author+"\t"+publishTime+"\t"+source+"\t"+category+"\t"+reputation); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } }
用到的两个工具方法代码
/** * 创建文档对象的工具方法 * @param book * @return */ public Document createDocument(Book book){ Document doc = new Document(); Field id = new Field("id", book.getId() + "", Store.YES, Index.ANALYZED); Field title = new Field("title", book.getTitle(), Store.YES, Index.ANALYZED); Field author = new Field("author", book.getAuthor(), Store.YES, Index.ANALYZED); Field publishTime = new Field("publishTime", book.getPublishTime(), Store.YES, Index.ANALYZED); Field source = new Field("source", book.getSource(), Store.YES, Index.ANALYZED); Field category = new Field("category", book.getCategory(), Store.YES, Index.ANALYZED); Field reputation = new Field("reputation", book.getReputation() + "", Store.YES, Index.ANALYZED); doc.add(id); doc.add(title); doc.add(author); doc.add(publishTime); doc.add(source); doc.add(category); doc.add(reputation); return doc; } /** * 创建Book对象 * @param title * @param author * @param publishTime * @param category * @param reputation * @return */ public Book createBook(String title,String author,String publishTime,String category,float reputation){ Random r = new Random(); int id = r.nextInt(10000); Book book = new Book(); book.setId(id); book.setAuthor(author); book.setTitle(title); book.setCategory(category); book.setPublishTime(publishTime); book.setReputation(reputation); book.setSource("清华大学出版社"); return book; }
二、改变boost(激励因子)
1、改变Document的boost(激励因子)
改变boost的大小,会导致Document的得分的改变,从而按照Lucene默认的对检索结果集的排序方式,改变检索结果中Document的排序的提前或者靠后。在计算得分的时候,使用到了boost的值,默认boost的值为1.0,也就说默认情况下Document的得分与boost的无关的。一旦改变了默认的boost的值,也就从Document的得分与boost无关,变为相关了:boost值越大,Document的得分越高。
2、改变Field的boost(激励因子)
改变Field的boost值,和改变Document的boost值是一样的。因为Document的boost是通过添加到Docuemnt中Field体现的,所以改变Field的boost值,可以改变Document的boost值。
示例代码
@Test public void testBoost(){ try { String path = "D://LuceneEx/day02"; String keyword = "android"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); // 使用 商业分词器 Analyzer mAnalyzer = new IKAnalyzer(); IndexReader reader = IndexReader.open(mdDirectory); IndexSearcher searcher = new IndexSearcher(reader); String[] fields = { "title", "category" }; // (在多个Filed中搜索) QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, mAnalyzer); Query query = parser.parse(keyword); TopDocs tops = searcher.search(query, null, 50); int count = tops.totalHits; System.out.println("totalHits=" + count); ScoreDoc[] docs = tops.scoreDocs; for (int i = 0; i < docs.length; i++) { Document doc = searcher.doc(docs[i].doc); float score = docs[i].score; int id = Integer.parseInt(doc.get("id")); String title = doc.get("title"); String author = doc.get("author"); String publishTime = doc.get("publishTime"); String source = doc.get("source"); String category = doc.get("category"); float reputation = Float.parseFloat(doc.get("reputation")); System.out.println(id + "\t" + title + "\t" + author + "\t" + publishTime + "\t" + source + "\t" + category + "\t" + reputation+"\t"+score); } reader.close(); searcher.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } @Test public void testAdd() { try { String path = "D://LuceneEx/day02"; File file = new File(path); Directory mdDirectory = FSDirectory.open(file); // 使用Lucene提供的分词器 // Analyzer mAnalyzer = new StandardAnalyzer(Version.LUCENE_36); // 使用 商业分词器 Analyzer mAnalyzer = new IKAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, mAnalyzer); IndexWriter writer = new IndexWriter(mdDirectory, config); Book book1 = createBook("Android内核揭秘", "ABC", "2010-07", "android 移动开发", 8.9f); Document doc1 = createDocument(book1); doc1.setBoost(2.0F); //boost:设置得分,2F在当前得分的基础上*2,使得分增高 Book book2 = createBook("Android多媒体开发", "BCD", "2011-07", "android 多媒体", 8.5f); Document doc2 = createDocument(book2); doc2.setBoost(2.5F); //boost:设置得分,2F在当前得分的基础上*2,使得分增高 Book book3 = createBook("Android企业应用开发", "QAB", "2012-05", "android 企业应用", 8.2f); Document doc3 = createDocument(book3); doc3.setBoost(1.5F); //boost:设置得分,2F在当前得分的基础上*2,使得分增高 writer.addDocument(doc1); writer.addDocument(doc2); writer.addDocument(doc3); writer.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (LockObtainFailedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
运行结果
totalHits=3
3383 Android多媒体开发BCD2011-07清华大学出版社android 多媒体8.51.259212
891 Android内核揭秘ABC2010-07清华大学出版社android 移动开发8.91.0073696
2919 Android企业应用开发QAB2012-05清华大学出版社android 企业应用8.20.75552726
三、自定义排序
待完成。。。