HuggingFace Modes
作者:qiqiqi 阅读次数:

Introduction

If you are unfamiliar with the original Transformer model, please refer to the paper: [1706.03762] Attention Is All You Need (arxiv.org)

Additionally, for a code-annotated guide to the original Transformer model, you can consult The Annotated Transformer (harvard.edu)

All models available on Hugging Face can be found here: Hugging Face – On a mission to solve NLP, one commit at a time.

The configuration details for these models are available in the documentation: Pretrained models — transformers 4.0.0 documentation (huggingface.co)


Category

All models on Hugging Face fall into one of the following categories:

An overview:

  • Autoregressive models (self-regressive models) are pre-trained on standard language modeling tasks: predicting the next token based on all previously read tokens. In simpler terms, the sequence is read from left to right. They correspond to the decoder of the original transformer model. While these models can be fine-tuned to achieve excellent results on many tasks, their primary application is text generation, as both their training and text generation proceed from left to right. A typical example of this model is GPT.

  • Autoencoding models (self-encoding models) are trained by disrupting input tokens in some way and attempting to reconstruct the original sequence during pre-training. In a sense, they correspond to the encoder of the original transformer model because they can view the entire sequence at input. Although they can be fine-tuned and achieve excellent results on many tasks (such as text generation), their best application is sequence classification or token classification. A typical example of this model is BERT.

  • Sequence-to-sequence models (sequence-to-sequence models) aim to frame all NLP tasks as sequence-to-sequence problems. They can be fine-tuned for many tasks, but their best application is translation, summarization, and reading comprehension. The original transformer model is an example of this type of model (specifically for translation). A typical example of this model is T5.

  • Multimodal models (multi-modal models) combine text input with other types of input, such as images, and are often tailored to specific tasks.

  • Retrieval-based models are not yet understood by the author.


Typical Representatives

Here is a brief introduction to the typical examples of each type of model.



Autoregressive models(elf-regressive models)

Original GPT:

GPT-2:

CTRL:

Transformer-XL:

======================================================

Autoencoding models(autoencoding models)

BERT:

RoBERTa:

DistilBERT:

======================================================

Sequence-to-sequence models((sequence-to-sequence models)

BART:

T5:

  • T5 (Text-To-Text Transfer Transformer) unifies all natural language processing (NLP) tasks into a single sequence-to-sequence framework. Instead of designing different input formats for various tasks, T5 treats every task as a text transformation problem. For instance, in a Chinese-to-English translation task, before T5, inputs might have been formatted differently depending on the system. With T5, the input is explicitly structured as an instruction combined with the source text, like “translate Chinese to English: 我喜欢跑步”, and the model outputs the translated text, “I like running”.

  • Paper:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (arxiv.org)

  • Model source code analysis: T5 - transformers 4.0.0 documentation

======================================================

Multimodal models(multimodal models)

MMBT:


返回列表

扫码关注不迷路!!!

郑州升龙商业广场B座25层

service@iqiqiqi.cn

企起期科技 qiqiqi

联系电话:187-0363-0315

联系电话:199-3777-5101