iamkissg
  • PaperHighlights
  • 2019
    • 03
      • Not All Contexts Are Created Equal Better Word Representations with Variable Attention
      • Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model
      • Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
      • pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
      • Contextual Word Representations: A Contextual Introduction
      • Not All Neural Embeddings are Born Equal
      • High-risk learning: acquiring new word vectors from tiny data
      • Learning word embeddings from dictionary definitions only
      • Dependency-Based Word Embeddings
    • 02
      • Improving Word Embedding Compositionality using Lexicographic Definitions
      • From Word Embeddings To Document Distances
      • Progressive Growing of GANs for Improved Quality, Stability, and Variation
      • Retrofitting Word Vectors to Semantic Lexicons
      • Bag of Tricks for Image Classification with Convolutional Neural Networks
      • Multi-Task Deep Neural Networks for Natural Language Understanding
      • Snapshot Ensembles: Train 1, get M for free
      • EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
      • Counter-fitting Word Vectors to Linguistic Constraints
      • AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling
      • Learning semantic similarity in a continuous space
      • Progressive Neural Networks
      • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • Language Models are Unsupervised Multitask Learners
    • 01
      • Querying Word Embeddings for Similarity and Relatedness
      • Data Distillation: Towards Omni-Supervised Learning
      • A Rank-Based Similarity Metric for Word Embeddings
      • Dict2vec: Learning Word Embeddings using Lexical Dictionaries
      • Graph Convolutional Networks for Text Classification
      • Improving Distributional Similarity with Lessons Learned from Word Embeddings
      • Real-time Personalization using Embeddings for Search Ranking at Airbnb
      • Glyce: Glyph-vectors for Chinese Character Representations
      • Auto-Encoding Dictionary Definitions into Consistent Word Embeddings
      • Distilling the Knowledge in a Neural Network
      • Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrin
      • The (Too Many) Problems of Analogical Reasoning with Word Vectors
      • Linear Ensembles of Word Embedding Models
      • Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance
      • Dynamic Meta-Embeddings for Improved Sentence Representations
  • 2018
    • 11
      • Think Globally, Embed Locally — Locally Linear Meta-embedding of Words
      • Learning linear transformations between counting-based and prediction-based word embeddings
      • Learning Word Meta-Embeddings by Autoencoding
      • Learning Word Meta-Embeddings
      • Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
    • 6
      • Universal Language Model Fine-tuning for Text Classification
      • Semi-supervised sequence tagging with bidirectional language models
      • Consensus Attention-based Neural Networks for Chinese Reading Comprehension
      • Attention-over-Attention Neural Networks for Reading Comprehension
      • Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
      • Convolutional Neural Networks for Sentence Classification
      • Deep contextualized word representations
      • Neural Architectures for Named Entity Recognition
      • Improving Language Understanding by Generative Pre-Training
      • A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence C
      • Teaching Machines to Read and Comprehend
    • 5
      • Text Understanding with the Attention Sum Reader Network
      • Effective Approaches to Attention-based Neural Machine Translation
      • Distance-based Self-Attention Network for Natural Language Inference
      • Deep Residual Learning for Image Recognition
      • U-Net: Convolutional Networks for Biomedical Image Segmentation
      • Memory Networks
      • Neural Machine Translation by Jointly Learning to Align and Translate
      • Convolutional Sequence to Sequence Learning
      • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
      • Graph Attention Networks
      • Attention is All You Need
      • DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
      • A Structured Self-attentive Sentence Embedding
      • Hierarchical Attention Networks for Document Classification
      • Grammar as a Foreign Language
      • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
      • Transforming Auto-encoders
      • Self-Attention with Relative Position Representations
    • 1
      • 20180108-20180114
  • 2017
    • 12
      • 20171218-2017124 论文笔记
    • 11
      • 20171127-20171203 论文笔记 1
      • 20171106-20171126 论文笔记
      • 20171030-20171105 论文笔记 1
  • Paper Title as Note Title
Powered by GitBook
On this page
  • 要点
  • 备注

Was this helpful?

  1. 2019
  2. 01

Linear Ensembles of Word Embedding Models

PreviousThe (Too Many) Problems of Analogical Reasoning with Word VectorsNextIntrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

Last updated 5 years ago

Was this helpful?

论文地址:

要点

我觉得本文可以归类到 word meta-embedding, 因为都是基于多个预训练好的 word embeddings 来生成一个更好的 word embedding, 但是作者说: 不, 我们不一样. 有点牵强的两个区别:

  1. 本文的 word embeddings 是使用同一个算法, 在同一个语料上训练得到的;

  2. 本文的一大贡献是: 做了正交性限制.

从第一点来看, 本文的初衷就像是通过降噪(不同word embeddings之间随机性误差)的方式来提升 word representations, 而算法在语料上捕获的有用的信息就沉淀下来了.

本文的大致思路可以用以下三个公式来概括:

最小化 target word embedding (Y) 与预训练的 word embeddings 的线性转换(WP) 的欧式距离;

用 WP 的均值来更新 Y;

以上是一个无限循环的过程, 用下式来控制结束, 当它小于阈值, 就停止.

文章的重点在于如何求取线性转换矩阵 P, 提供了两种思路:

  1. 由于 Y=PW 用普通最小二乘法(Ordinary Least Squares, OLS)可以得到解析解, 作者首先就用了该方法: $P=(W^{T} W)^{-1} W^{T} Y$. 该方法的问题在于, Y 和 WP 可能会趋向于 0, 从而导致无效解. 作者的解决办法是, 每个循环, 先对 Y 进行缩放(rescale), 使每一列的方法等于 1;

  2. 第二种方法将问题看作一个 Orthogonal Procrustes 问题, 有正交性限制, 要求 $PP^{T}=P^{T}P=I$. 该方法同样有解析解, 过程比 OLS 要复杂, 具体如下:

  3. 首先计算: $S=W^{T}Y$;

  4. 使用 SVD 进行对角化: $S^{T}S=VD{S}V^{T}$, $SS^{T}=UD{S}U^{T}$;

  5. 最后 $P=UV^{T}$.

方法 2 的好处在于, 正交性限制保留了 WP 中向量的模与向量间的角度, 也不用担心收敛到 0.

实验显示, 方法 2 得到的 word embedding 各方面都好得多, 就不一一描述了.

备注

我们已经知道使用集成方法(ensemble method)时, 模型的差异越明显, 互补性就越强. 本文的启示是, 就 word embeddings (或其他类似问题), 正交性限制的重要性.

http://www.aclweb.org/anthology/W17-0212
linear_ensembles_e1.png
linear_ensembles_e2.png
linear_ensembles_e3.png