join

了解sql的朋友都知道，我们在查询的时候可以采用join查询，即对有一定关联关系的对象进行联合查询来对多维的数据进行整理。

这个联合查询的方式挺方便的，跟我们现实生活中的托人找关系类似，我们想要完成一件事，先找自己的熟人，然后通过熟人在一次找到其他，最终通过这种手段找到想要联系到的人。

有点类似于”世间万物皆有联系“的感觉。

lucene的join包提供了索引时join和查询时join的功能；

Index-time join

大意是索引时join提供了查询时join的支持，且 IndexWriter.addDocuments() 方法调用时被join的documents以单个document块存储索引。

索引时join对普通文本内容（如xml文档或数据库表）是方便可用的。特别是对类似于数据库的那种多表关联的情况，我们需要对提供关联关系的列提供join支持；

在索引时join的时候，索引中的documents被分割成parent documents（每个索引块的最后一个document）和child documents (除了parent documents外的所有documents).

由于lucene并不记录doc块的信息，我们需要提供一个Filter来标示parent documents。

在搜索结果的时候，我们利用ToParentBlockJoinQuery来从child query到parent document space来remap/join对应的结果。

如果我们只关注匹配查询条件的parent documents，我们可以用任意的collector来采集匹配到的parent documents；如果我们还想采集匹配parent document查询条件的child documents，我们就需要利用ToParentBlockJoinCollector来进行查询；一旦查询完成，我们可以利用ToParentBlockJoinCollector.getTopGroups()来获取匹配条件的TopGroups.

Query-time joins

查询时join是基于索引词，其实现有两步：

1) 第一步先从匹配fromQuery的fromField中采集所有的数据；

2) 从第一步得到的数据中筛选出所有符合条件的documents

查询时join接收一下输入参数：

fromField：fromField的名称，即要join的documents中的字段；
formQuery: 用户的查询条件
multipleValuesPerDocument： fromField在document是否是多个值
scoreMode：定义other join side中score是如何被使用的。如果不关注scoring，我们只需要设置成ScoreMode.None，此种方式会忽略评分因此会更高效和节约内存；
toField：toField的名称，即要join的toField的在对应的document中的字段

通常查询时join的实现类似于如下：

String fromField = "from"; // Name of the from field
boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
String toField = "to"; // Name of the to field
ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values

Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
// Render topDocs...

测试代码

代码

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.*;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.join.JoinUtil;
import org.apache.lucene.search.join.ScoreMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
import org.junit.Test;

import java.nio.file.Paths;

/**
 * @author binbin.hou
 * @since 1.0.0
 */
public class TestJoin {

    @Test
    public void testSimple() throws Exception {
        final String idField = "id";
        final String toField = "productId";

        Directory dir = FSDirectory.open(Paths.get("data/join"));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        IndexWriter w = new IndexWriter(dir, config);

        // 0
        Document doc = new Document();
        doc.add(new TextField("description", "random text", Field.Store.YES));
        doc.add(new TextField("name", "name1", Field.Store.YES));
        doc.add(new TextField(idField, "1", Field.Store.YES));
        doc.add(new SortedDocValuesField(idField, new BytesRef("1")));

        w.addDocument(doc);

        // 1
        Document doc1 = new Document();
        doc1.add(new TextField("price", "10.0", Field.Store.YES));
        doc1.add(new TextField(idField, "2", Field.Store.YES));
        doc1.add(new SortedDocValuesField(idField, new BytesRef("2")));
        doc1.add(new TextField(toField, "1", Field.Store.YES));
        doc1.add(new SortedDocValuesField(toField, new BytesRef("1")));

        w.addDocument(doc1);

        // 2
        Document doc2 = new Document();
        doc2.add(new TextField("price", "20.0", Field.Store.YES));
        doc2.add(new TextField(idField, "3", Field.Store.YES));
        doc2.add(new SortedDocValuesField(idField, new BytesRef("3")));
        doc2.add(new TextField(toField, "1", Field.Store.YES));
        doc2.add(new SortedDocValuesField(toField, new BytesRef("1")));

        w.addDocument(doc2);

        // 3
        Document doc3 = new Document();
        doc3.add(new TextField("description", "more random text", Field.Store.YES));
        doc3.add(new TextField("name", "name2", Field.Store.YES));
        doc3.add(new TextField(idField, "4", Field.Store.YES));
        doc3.add(new SortedDocValuesField(idField, new BytesRef("4")));

        w.addDocument(doc3);


        // 4
        Document doc4 = new Document();
        doc4.add(new TextField("price", "10.0", Field.Store.YES));
        doc4.add(new TextField(idField, "5", Field.Store.YES));
        doc4.add(new SortedDocValuesField(idField, new BytesRef("5")));
        doc4.add(new TextField(toField, "4", Field.Store.YES));
        doc4.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc4);

        // 5
        Document doc5 = new Document();
        doc5.add(new TextField("price", "20.0", Field.Store.YES));
        doc5.add(new TextField(idField, "6", Field.Store.YES));
        doc5.add(new SortedDocValuesField(idField, new BytesRef("6")));
        doc5.add(new TextField(toField, "4", Field.Store.YES));
        doc5.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc5);

        //6
        Document doc6 = new Document();
        doc6.add(new TextField(toField, "4", Field.Store.YES));
        doc6.add(new SortedDocValuesField(toField, new BytesRef("4")));
        w.addDocument(doc6);
        w.commit();
        w.close();
        IndexReader reader = DirectoryReader.open(dir);
        IndexSearcher indexSearcher = new IndexSearcher(reader);


        // Search for product
        Query joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name2")), indexSearcher, ScoreMode.None);
        System.out.println(joinQuery);
        TopDocs result = indexSearcher.search(joinQuery, 10);
        System.out.println("查询到的匹配数据："+result.totalHits);


        joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name1")), indexSearcher, ScoreMode.None);
        result = indexSearcher.search(joinQuery, 10);
        System.out.println("查询到的匹配数据："+result.totalHits);
        // Search for offer
        joinQuery = JoinUtil.createJoinQuery(toField, false, idField, new TermQuery(new Term("id", "5")), indexSearcher, ScoreMode.None);
        result = indexSearcher.search(joinQuery, 10);
        System.out.println("查询到的匹配数据："+result.totalHits);

        indexSearcher.getIndexReader().close();
        dir.close();
    }

}

日志如下：

TermsQuery{field=productIdfromQuery=name:name2}
查询到的匹配数据：3
查询到的匹配数据：2
查询到的匹配数据：1

以第一个查询为例：

我们在查询的时候先根据name=name2这个查询条件找到记录为doc3的document,由于查询的是toField匹配的，我们在根据doc3找到其toField的值为4，然后查询条件变为productId:4，找出除本条记录外的其他数据，结果正好为3，符合条件。

参考资料

一步一步跟我学习lucene（18）—lucene索引时join和查询时join使用示例

join
Index-time join
Query-time joins
测试代码
- 代码
参考资料

Lucene-21-lucene索引时join和查询时join使用示例

join

Index-time join

Query-time joins

测试代码

代码

参考资料

更多学习