Natural Language Processing¶
Here we include an image of most important transformer models introduced since the introduction of original transformer as well as the link to benchmark scores of the tasks cited in the papers.
References
mT5: A massively multilingual pre-trained text-to-text transformer
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
MBART: Multilingual Denoising Pre-training for Neural Machine Translation
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Pegasus: Pre-training with Extracted Gap-sentences forAbstractive Summarization
XLM-RoBERTa: Unsupervised Cross-lingual Representation Learning at Scale
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
DistilBERT: a distilled version of BERT: smaller, faster, cheaper and lighter
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
CTRL: A Conditional Transformer Language Model for Controllable Generation
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Original GPT: Improving Language Understanding by Generative Pre-Training