Lucene的Field说明

Lucene存储对象是以document为存储单元,对象中相关的属性值则存放到Field中;

lucene中所有Field都是IndexableField接口的实现

接口

表示用于索引的单个字段。 IndexWriter 使用 Iterable<IndexableField> 作为文档。

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public interface IndexableField { // 字段名称 public String name(); // 描述字段的属性 public IndexableFieldType fieldType(); /** * 创建用于索引此字段的 TokenStream。 * 如果合适,实现应该使用给定的分析器来创建 TokenStreams。 * * @param analyzer Analyzer that should be used to create the TokenStreams from * @param reuse TokenStream for a previous instance of this field <b>name</b>. This allows * custom field types (like StringField and NumericField) that do not use * the analyzer to still have good performance. Note: the passed-in type * may be inappropriate, for example if you mix up different types of Fields * for the same field name. So it's the responsibility of the implementation to * check. * @return TokenStream value for indexing the document. Should always return * a non-null value if the field is to be indexed */ public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse); /** Non-null if this field has a binary value */ public BytesRef binaryValue(); /** Non-null if this field has a string value */ public String stringValue(); /** Non-null if this field has a Reader value */ public Reader readerValue(); /** Non-null if this field has a numeric value */ public Number numericValue(); }

Field 父类

所有的Field均是 org.apache.lucene.document.Field 的子类;

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class Field implements IndexableField { /** * Field's type */ protected final IndexableFieldType type; /** * Field's name */ protected final String name; /** Field's value */ protected Object fieldsData; /** Pre-analyzed tokenStream for indexed fields; this is * separate from fieldsData because you are allowed to * have both; eg maybe field has a String value but you * customize how it's tokenized */ protected TokenStream tokenStream; //构造器和其他方法 }

常见 Field

项目中我们常用的Field类型主要有

IntField, LongField, FloatField, DoubleField, BinaryDocValuesField, NumericDocValuesField, SortedDocValuesField, StringField, TextField, StoredField.

IntField 主要对int类型的字段进行存储,需要注意的是如果需要对InfField进行排序使用SortField.Type.INT来比较,如果进范围查询或过滤,需要采用NumericRangeQuery.newIntRange()

LongField 主要处理Long类型的字段的存储,排序使用SortField.Type.Long,如果进行范围查询或过滤利用NumericRangeQuery.newLongRange(),LongField常用来进行时间戳的排序,保存System.currentTimeMillions()

FloatField 对Float类型的字段进行存储,排序采用SortField.Type.Float,范围查询采用NumericRangeQuery.newFloatRange()

BinaryDocVluesField 只存储不共享值,如果需要共享值可以用SortedDocValuesField

NumericDocValuesField 用于数值类型的Field的排序(预排序),需要在要排序的field后添加一个同名的NumericDocValuesField

SortedDocValuesField 用于String类型的Field的排序,需要在StringField后添加同名的SortedDocValuesField

StringField 用户String类型的字段的存储,StringField是只索引不分词

TextField 对String类型的字段进行存储,TextField和StringField的不同是TextField既索引又分词

StoredField 存储Field的值,可以用IndexSearcher.doc和IndexReader.document来获取此Field和存储的值

NumericDocValuesField 例子

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
package com.github.houbb.lucene.learn.chap02; import com.github.houbb.lucene.learn.utils.IndexerUtil; import org.apache.lucene.document.Document; import org.apache.lucene.document.NumericDocValuesField; import org.apache.lucene.index.IndexWriter; import java.io.IOException; /** * @author binbin.hou * @since 1.0.0 */ public class IndexIntFieldDemo { public static void main(String[] args) throws IOException { Document document = new Document(); //要排序必须加同名的field,且类型为NumericDocValuesField document.add(new NumericDocValuesField("intValue", 30)); Document document1 = new Document(); document1.add(new NumericDocValuesField("intValue2", 40)); IndexWriter writer = IndexerUtil.createIndexWriter(); writer.addDocument(document); writer.addDocument(document1); writer.commit(); writer.close(); } }
  • IndexerUtil.java

工具类实现如下:

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package com.github.houbb.lucene.learn.utils; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import java.io.IOException; import java.nio.file.Paths; /** * @author binbin.hou * @since 1.0.0 */ public final class IndexerUtil { private IndexerUtil(){} public static IndexWriter createIndexWriter() throws IOException { String path = "D:\\lucene\\dataindex"; return createIndexWriter(path); } public static IndexWriter createIndexWriter(String indexDir) throws IOException { //得到索引所在目录的路径 Directory directory = FSDirectory.open(Paths.get(indexDir)); // 标准分词器 Analyzer analyzer = new StandardAnalyzer(); //保存用于创建IndexWriter的所有配置。 IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer); //实例化IndexWriter return new IndexWriter(directory, iwConfig); } }

参考资料

一步一步跟我学习lucene(2)—lucene的各种Field及其排序