iamkissg
  • PaperHighlights
  • 2019
    • 03
      • Not All Contexts Are Created Equal Better Word Representations with Variable Attention
      • Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model
      • Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
      • pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
      • Contextual Word Representations: A Contextual Introduction
      • Not All Neural Embeddings are Born Equal
      • High-risk learning: acquiring new word vectors from tiny data
      • Learning word embeddings from dictionary definitions only
      • Dependency-Based Word Embeddings
    • 02
      • Improving Word Embedding Compositionality using Lexicographic Definitions
      • From Word Embeddings To Document Distances
      • Progressive Growing of GANs for Improved Quality, Stability, and Variation
      • Retrofitting Word Vectors to Semantic Lexicons
      • Bag of Tricks for Image Classification with Convolutional Neural Networks
      • Multi-Task Deep Neural Networks for Natural Language Understanding
      • Snapshot Ensembles: Train 1, get M for free
      • EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
      • Counter-fitting Word Vectors to Linguistic Constraints
      • AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling
      • Learning semantic similarity in a continuous space
      • Progressive Neural Networks
      • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • Language Models are Unsupervised Multitask Learners
    • 01
      • Querying Word Embeddings for Similarity and Relatedness
      • Data Distillation: Towards Omni-Supervised Learning
      • A Rank-Based Similarity Metric for Word Embeddings
      • Dict2vec: Learning Word Embeddings using Lexical Dictionaries
      • Graph Convolutional Networks for Text Classification
      • Improving Distributional Similarity with Lessons Learned from Word Embeddings
      • Real-time Personalization using Embeddings for Search Ranking at Airbnb
      • Glyce: Glyph-vectors for Chinese Character Representations
      • Auto-Encoding Dictionary Definitions into Consistent Word Embeddings
      • Distilling the Knowledge in a Neural Network
      • Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrin
      • The (Too Many) Problems of Analogical Reasoning with Word Vectors
      • Linear Ensembles of Word Embedding Models
      • Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance
      • Dynamic Meta-Embeddings for Improved Sentence Representations
  • 2018
    • 11
      • Think Globally, Embed Locally — Locally Linear Meta-embedding of Words
      • Learning linear transformations between counting-based and prediction-based word embeddings
      • Learning Word Meta-Embeddings by Autoencoding
      • Learning Word Meta-Embeddings
      • Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
    • 6
      • Universal Language Model Fine-tuning for Text Classification
      • Semi-supervised sequence tagging with bidirectional language models
      • Consensus Attention-based Neural Networks for Chinese Reading Comprehension
      • Attention-over-Attention Neural Networks for Reading Comprehension
      • Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
      • Convolutional Neural Networks for Sentence Classification
      • Deep contextualized word representations
      • Neural Architectures for Named Entity Recognition
      • Improving Language Understanding by Generative Pre-Training
      • A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence C
      • Teaching Machines to Read and Comprehend
    • 5
      • Text Understanding with the Attention Sum Reader Network
      • Effective Approaches to Attention-based Neural Machine Translation
      • Distance-based Self-Attention Network for Natural Language Inference
      • Deep Residual Learning for Image Recognition
      • U-Net: Convolutional Networks for Biomedical Image Segmentation
      • Memory Networks
      • Neural Machine Translation by Jointly Learning to Align and Translate
      • Convolutional Sequence to Sequence Learning
      • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
      • Graph Attention Networks
      • Attention is All You Need
      • DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
      • A Structured Self-attentive Sentence Embedding
      • Hierarchical Attention Networks for Document Classification
      • Grammar as a Foreign Language
      • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
      • Transforming Auto-encoders
      • Self-Attention with Relative Position Representations
    • 1
      • 20180108-20180114
  • 2017
    • 12
      • 20171218-2017124 论文笔记
    • 11
      • 20171127-20171203 论文笔记 1
      • 20171106-20171126 论文笔记
      • 20171030-20171105 论文笔记 1
  • Paper Title as Note Title
Powered by GitBook
On this page

Was this helpful?

  1. 2019
  2. 02

From Word Embeddings To Document Distances

PreviousImproving Word Embedding Compositionality using Lexicographic DefinitionsNextProgressive Growing of GANs for Improved Quality, Stability, and Variation

Last updated 5 years ago

Was this helpful?

论文地址:

要点

本文的初衷在于: BoW 或 TF-IDF 等方法在度量文档相似度时, 仅考虑了句子层面的语义, 却没有考虑到单个单词的作用. 举个例子, 以下两个句子, 在去掉停词之后, 就没有重叠单词了, 但是它们却具有相似的语义. (直观上, 各单词词频不一样, TF-IDF 不适用; 鉴于单词相似性, BoW 可能受用, 也很可能不, 取决于组合方式和词向量的性能.)

  1. Obama speaks to the media in Illinois.

  2. The president greets the press in Chicago.

随着 word embeddings 习得的 representations 越来越好, 再受到 Earth Mover's Distance 的启发, 本文提出了名为 Word Mover's Distance(WMD) 的方法来度量文档相似性. 简而言之, 本文的想法是: 文档间的距离(不相似度)可以表示为一个文档的单词(embeded)全部转换成另一个文档中的单词所需的最小移动距离. 思想上与通过增删改最少字母来表示单词距离的 等方法有些相似, 只是 WMD 累计的不再是修改字母的次数.

WMD 具体是这样的: 使用一个 nxd 的 embedding matrix (d 是词量的维数, n 是词汇表的大小), 那么一个文档可以表示成一个 n 维的向量, 每一维上用文档中的词频(小数)来表示, 文章称为 nBOW. 为了加入单词间的相似度, 文章将 c(i,j) 表示为两个单词的转移代价, 可以是欧式距离或 cosine similarity. 显然, 语义越相近的单词间的转移代价越小(比如例句中的 Illinois 和 Chicago). 然后使用两个文档的 nBOW, 记为 d 和 d', 构造一个 nxn 的矩阵 T, T 的每个元素都表征一个单词到另一个单词的转移量, 可见 T 的某一行或列表征了一个单词从一个文档转移到另一个单词文档中所有单词的量(0元素不考虑在内). 因此, WMD 可以表示为: , 一个文档中所有单词转换成另一个文档中所有单词的转移代价(移动量x代价).

对转移量有如下限制, 即到文档中某一单词的转移量等于该单词在文档中的词频.

WMD 的示意图如下(图中的数值已经是转移代价). 上半图展示了在剔除停词等之后, 3 个句子含有相同数量单词的情况, 只画出了单词到另一个文档中最邻近单词的箭头, 属于理想情况. 下半图展示了更普遍的情况, 即文档单词数不相等, D0 的 nBOW 表示, 每个单词对应的值都是 0.25, 而 D1 的则为 0.33.

WMD 的计算量为 $O(p^3log(p))$, p 是两个文档包含的单词总数, 需要极大的计算量. 因此文章提出了两个更快的近似替代法:

  1. Word Centroid Distance(WCD): 将文档表示为词向量的加权平均, 计算量是 $O(dp)$

  2. Relaxed RWMD(RWMD): 即放宽了以上两条限制中的一条. 为了得到 WMD 更紧的下界, 文章采用了分别取消一条限制得到两个相似度, 取其大者. 计算量 $O(p^2)$.

为了进一步减小计算量, 文章使用了 Prefetch and prune 的算法: 对需要排序的文档, 根据其与查询文档的 WCD 进行升序排序, 并计算前 k 个文档的 WMD. 然后将剩余的文档反转, 即从 WCD 值最小的文档开始, 检查 RWMD 下界是否超过了当前第 k 相近的文档, 是则剪掉它, 否则计算它的 WMD 并更新 top k 最相似的文档列表. 由于 RWMD 能得到非常紧的下界, 对于一些数据集, 该算法能剪掉 95% 的不相似文档.

下图是在一个数据集上计算的文档间的 WCD, RWMD, WMD. 可见, WCD 真的是一个非常非常松的下界, 而 RWMD 则是一个非常紧的下界.

http://proceedings.mlr.press/v37/kusnerb15.pdf
Levenshtein distance
wmd_constraints.png
wmd_illustration.png
wmd_twitter.png
wmd_sum.png