Natural Language Processing¶
Here we include an image of most important transformer models introduced since the introduction of original transformer as well as the link to benchmark scores of the tasks cited in the papers.
References
- mT5: A massively multilingual pre-trained text-to-text transformer 
- Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing 
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators 
- MBART: Multilingual Denoising Pre-training for Neural Machine Translation 
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 
- Pegasus: Pre-training with Extracted Gap-sentences forAbstractive Summarization 
- XLM-RoBERTa: Unsupervised Cross-lingual Representation Learning at Scale 
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 
- DistilBERT: a distilled version of BERT: smaller, faster, cheaper and lighter 
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 
- CTRL: A Conditional Transformer Language Model for Controllable Generation 
- XLNet: Generalized Autoregressive Pretraining for Language Understanding 
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context 
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 
- Original GPT: Improving Language Understanding by Generative Pre-Training