iamkissg
  • PaperHighlights
  • 2019
    • 03
      • Not All Contexts Are Created Equal Better Word Representations with Variable Attention
      • Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model
      • Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
      • pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
      • Contextual Word Representations: A Contextual Introduction
      • Not All Neural Embeddings are Born Equal
      • High-risk learning: acquiring new word vectors from tiny data
      • Learning word embeddings from dictionary definitions only
      • Dependency-Based Word Embeddings
    • 02
      • Improving Word Embedding Compositionality using Lexicographic Definitions
      • From Word Embeddings To Document Distances
      • Progressive Growing of GANs for Improved Quality, Stability, and Variation
      • Retrofitting Word Vectors to Semantic Lexicons
      • Bag of Tricks for Image Classification with Convolutional Neural Networks
      • Multi-Task Deep Neural Networks for Natural Language Understanding
      • Snapshot Ensembles: Train 1, get M for free
      • EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
      • Counter-fitting Word Vectors to Linguistic Constraints
      • AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling
      • Learning semantic similarity in a continuous space
      • Progressive Neural Networks
      • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • Language Models are Unsupervised Multitask Learners
    • 01
      • Querying Word Embeddings for Similarity and Relatedness
      • Data Distillation: Towards Omni-Supervised Learning
      • A Rank-Based Similarity Metric for Word Embeddings
      • Dict2vec: Learning Word Embeddings using Lexical Dictionaries
      • Graph Convolutional Networks for Text Classification
      • Improving Distributional Similarity with Lessons Learned from Word Embeddings
      • Real-time Personalization using Embeddings for Search Ranking at Airbnb
      • Glyce: Glyph-vectors for Chinese Character Representations
      • Auto-Encoding Dictionary Definitions into Consistent Word Embeddings
      • Distilling the Knowledge in a Neural Network
      • Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrin
      • The (Too Many) Problems of Analogical Reasoning with Word Vectors
      • Linear Ensembles of Word Embedding Models
      • Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance
      • Dynamic Meta-Embeddings for Improved Sentence Representations
  • 2018
    • 11
      • Think Globally, Embed Locally — Locally Linear Meta-embedding of Words
      • Learning linear transformations between counting-based and prediction-based word embeddings
      • Learning Word Meta-Embeddings by Autoencoding
      • Learning Word Meta-Embeddings
      • Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
    • 6
      • Universal Language Model Fine-tuning for Text Classification
      • Semi-supervised sequence tagging with bidirectional language models
      • Consensus Attention-based Neural Networks for Chinese Reading Comprehension
      • Attention-over-Attention Neural Networks for Reading Comprehension
      • Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
      • Convolutional Neural Networks for Sentence Classification
      • Deep contextualized word representations
      • Neural Architectures for Named Entity Recognition
      • Improving Language Understanding by Generative Pre-Training
      • A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence C
      • Teaching Machines to Read and Comprehend
    • 5
      • Text Understanding with the Attention Sum Reader Network
      • Effective Approaches to Attention-based Neural Machine Translation
      • Distance-based Self-Attention Network for Natural Language Inference
      • Deep Residual Learning for Image Recognition
      • U-Net: Convolutional Networks for Biomedical Image Segmentation
      • Memory Networks
      • Neural Machine Translation by Jointly Learning to Align and Translate
      • Convolutional Sequence to Sequence Learning
      • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
      • Graph Attention Networks
      • Attention is All You Need
      • DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
      • A Structured Self-attentive Sentence Embedding
      • Hierarchical Attention Networks for Document Classification
      • Grammar as a Foreign Language
      • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
      • Transforming Auto-encoders
      • Self-Attention with Relative Position Representations
    • 1
      • 20180108-20180114
  • 2017
    • 12
      • 20171218-2017124 论文笔记
    • 11
      • 20171127-20171203 论文笔记 1
      • 20171106-20171126 论文笔记
      • 20171030-20171105 论文笔记 1
  • Paper Title as Note Title
Powered by GitBook
On this page

Was this helpful?

  1. 2019
  2. 02

Improving Word Embedding Compositionality using Lexicographic Definitions

Previous02NextFrom Word Embeddings To Document Distances

Last updated 5 years ago

Was this helpful?

论文地址:

要点

从题目, 我们了解到本文的目的在于借单词的定义来提高 compositionality.

那么什么是 compositionality 呢? 基于 word's context 来学习 word representation, 使得 word vector 同时捕获了语义(semantics)与语法(syntax). 语法信息对 composition 有用, 但无益于单词单独使用的情况. Composition 就是对多个单词的词向量加工处理, 得到一个能表征词组含义的 representation. Bag-of-Words 是最常使用的模型之一. Compositionality 就是词组被"压缩"处理之后, 还能准确表意的能力.

文章提出并对比了多种方法:

  1. Algebraic composition: 即用代数运算来生成最终的 word representation, 包括向量加, 向量乘, 每一维的最大特征和平均特征 (normalization 会造成信息丢失);

  2. 边 composing 边调整 embeddings: 说来也简单, 就是以 为目标函数, 缩小定义中的词与被定义词的距离, 同时增大它与随机单词间的距离. 欧式距离可以换成 cosine similarity;

  3. Learning to compose: 训练模型, 包括单层神经网络, RNN, CNN 等. 即使是简单的单层神经网络, 也避免了原生向量空间的限制, 从而充分发挥词向量的 compositionality. 不过文中的单层神经网络是对方法1中线性组合过的向量的操作, 后面再接一个 tanh 层. RNN/CNN 都是常规操作, RNN 使用了 GRU, CNN 使用了不同的 window size.

为了检验 compositionality, 文章特地构造了一个测试集. 大体是对 composed representation 搜索最邻近的单词们, 排序, 查看正确单词的排名. 此处文章用了一个 ball tree algorithm. 根据作者的说法, 基于排序的评估更合适, 因为它不依赖于向量空间, 而且对排序可以使用不同的指标, 从而提供不同角度的观察. 事实上, 本文就用了多种指标, 包括: Mean Reciprocal Rank, Mean Average Precision, Mean Precision@10, Mean Normalized Rank.

记下结论:

  1. fasttext 的训练就采用了向量加的方法(n-gram 的相加), 所以天然从向量加中受益;

  2. 向量乘差极了;

URL
improving_word_embedding_compositionality_triplet_loss.png