Lucene的Field说明
Lucene存储对象是以document为存储单元,对象中相关的属性值则存放到Field中;
lucene中所有Field都是IndexableField接口的实现
接口
表示用于索引的单个字段。 IndexWriter 使用 Iterable<IndexableField>
作为文档。
public interface IndexableField {
// 字段名称
public String name();
// 描述字段的属性
public IndexableFieldType fieldType();
/**
* 创建用于索引此字段的 TokenStream。
* 如果合适,实现应该使用给定的分析器来创建 TokenStreams。
*
* @param analyzer Analyzer that should be used to create the TokenStreams from
* @param reuse TokenStream for a previous instance of this field <b>name</b>. This allows
* custom field types (like StringField and NumericField) that do not use
* the analyzer to still have good performance. Note: the passed-in type
* may be inappropriate, for example if you mix up different types of Fields
* for the same field name. So it's the responsibility of the implementation to
* check.
* @return TokenStream value for indexing the document. Should always return
* a non-null value if the field is to be indexed
*/
public TokenStream tokenStream(Analyzer analyzer, TokenStream reuse);
/** Non-null if this field has a binary value */
public BytesRef binaryValue();
/** Non-null if this field has a string value */
public String stringValue();
/** Non-null if this field has a Reader value */
public Reader readerValue();
/** Non-null if this field has a numeric value */
public Number numericValue();
}
Field 父类
所有的Field均是 org.apache.lucene.document.Field 的子类;
public class Field implements IndexableField {
/**
* Field's type
*/
protected final IndexableFieldType type;
/**
* Field's name
*/
protected final String name;
/** Field's value */
protected Object fieldsData;
/** Pre-analyzed tokenStream for indexed fields; this is
* separate from fieldsData because you are allowed to
* have both; eg maybe field has a String value but you
* customize how it's tokenized */
protected TokenStream tokenStream;
//构造器和其他方法
}
常见 Field
项目中我们常用的Field类型主要有
IntField, LongField, FloatField, DoubleField, BinaryDocValuesField, NumericDocValuesField, SortedDocValuesField, StringField, TextField, StoredField.
IntField 主要对int类型的字段进行存储,需要注意的是如果需要对InfField进行排序使用SortField.Type.INT来比较,如果进范围查询或过滤,需要采用NumericRangeQuery.newIntRange()
LongField 主要处理Long类型的字段的存储,排序使用SortField.Type.Long,如果进行范围查询或过滤利用NumericRangeQuery.newLongRange(),LongField常用来进行时间戳的排序,保存System.currentTimeMillions()
FloatField 对Float类型的字段进行存储,排序采用SortField.Type.Float,范围查询采用NumericRangeQuery.newFloatRange()
BinaryDocVluesField 只存储不共享值,如果需要共享值可以用SortedDocValuesField
NumericDocValuesField 用于数值类型的Field的排序(预排序),需要在要排序的field后添加一个同名的NumericDocValuesField
SortedDocValuesField 用于String类型的Field的排序,需要在StringField后添加同名的SortedDocValuesField
StringField 用户String类型的字段的存储,StringField是只索引不分词
TextField 对String类型的字段进行存储,TextField和StringField的不同是TextField既索引又分词
StoredField 存储Field的值,可以用IndexSearcher.doc和IndexReader.document来获取此Field和存储的值
NumericDocValuesField 例子
package com.github.houbb.lucene.learn.chap02;
import com.github.houbb.lucene.learn.utils.IndexerUtil;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.index.IndexWriter;
import java.io.IOException;
/**
* @author binbin.hou
* @since 1.0.0
*/
public class IndexIntFieldDemo {
public static void main(String[] args) throws IOException {
Document document = new Document();
//要排序必须加同名的field,且类型为NumericDocValuesField
document.add(new NumericDocValuesField("intValue", 30));
Document document1 = new Document();
document1.add(new NumericDocValuesField("intValue2", 40));
IndexWriter writer = IndexerUtil.createIndexWriter();
writer.addDocument(document);
writer.addDocument(document1);
writer.commit();
writer.close();
}
}
- IndexerUtil.java
工具类实现如下:
package com.github.houbb.lucene.learn.utils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.io.IOException;
import java.nio.file.Paths;
/**
* @author binbin.hou
* @since 1.0.0
*/
public final class IndexerUtil {
private IndexerUtil(){}
public static IndexWriter createIndexWriter() throws IOException {
String path = "D:\\lucene\\dataindex";
return createIndexWriter(path);
}
public static IndexWriter createIndexWriter(String indexDir) throws IOException {
//得到索引所在目录的路径
Directory directory = FSDirectory.open(Paths.get(indexDir));
// 标准分词器
Analyzer analyzer = new StandardAnalyzer();
//保存用于创建IndexWriter的所有配置。
IndexWriterConfig iwConfig = new IndexWriterConfig(analyzer);
//实例化IndexWriter
return new IndexWriter(directory, iwConfig);
}
}