Doc2Vec

DL4J 中用于语言处理的 Doc2Vec 和任意文档。

Doc2Vec 的主要目的是将任意文档与标签相关联，因此需要标签。

Doc2vec 是 word2vec 的扩展，它学习关联标签和单词，而不是单词与其他单词。

Deeplearning4j 的实现旨在服务于 Java、Scala 和 Clojure 社区。

第一步是提出一个表示文档“含义”的向量，然后可以将其用作监督机器学习算法的输入，以将文档与标签相关联。

在 ParagraphVectors 构建器模式中，labels() 方法指向要训练的标签。

在下面的示例中，您可以看到与情绪分析相关的标签：

  [java]
1
.labels(Arrays.asList("negative", "neutral","positive"))

这是使用段落向量进行分类的完整工作示例：

  [java]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public void testDifferentLabels() throws Exception {
    ClassPathResource resource = new ClassPathResource("/labeled");
    File file = resource.getFile();
    LabelAwareSentenceIterator iter = LabelAwareUimaSentenceIterator.createWithPath(file.getAbsolutePath());

    TokenizerFactory t = new UimaTokenizerFactory();

    ParagraphVectors vec = new ParagraphVectors.Builder()
            .minWordFrequency(1).labels(Arrays.asList("negative", "neutral","positive"))
            .layerSize(100)
            .stopWords(new ArrayList<String>())
            .windowSize(5).iterate(iter).tokenizerFactory(t).build();

    vec.fit();

    assertNotEquals(vec.lookupTable().vector("UNK"), vec.lookupTable().vector("negative"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("positive"));
    assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("neutral"));}

Futher Reading

句子和文档的分布式表示 Distributed Representations of Sentences and Documents

参考资料

https://deeplearning4j.konduit.ai/deeplearning4j/tutorials/language-processing/doc2vec

Doc2Vec
Futher Reading
参考资料

DeepLearning4j-09-DOC2VEC

Doc2Vec

Futher Reading

参考资料

更多学习