chatglm-6B Training custom models

作者：qiqiqi 阅读次数：

Introduction
ChatGLM-6B is an open-source text generation dialogue model based on the General Language Model (GLM) framework, with 6.2 billion parameters. By combining model distillation technology, it is measured to occupy approximately 6G of VRAM during training on an RTX 2080Ti graphics card.

Advantages:

1. Lower deployment threshold: Under FP16 half-precision, ChatGLM-6B requires at least 13GB of VRAM for inference. By combining model quantization technology, this requirement can be further lowered to 10GB (INT8) and 6GB (INT4), enabling deployment of ChatGLM-6B on consumer-grade graphics cards.

2. Longer sequence length: Compared to GLM-10B (sequence length 1024), ChatGLM-6B has a sequence length of 2048, supporting longer conversations and applications.

3. Human intent alignment training: Supervised Fine-Tuning (Supervised Fine-Tuning), Feedback Bootstrap (Feedback Bootstrap), and Reinforcement Learning from Human Feedback (Reinforcement Learning from Human Feedback) methods are used to give the model a preliminary ability to understand human instruction intent. The output format is markdown, for easy display. The supervised fine-tuning method is now open source.

Disadvantages:

1. Smaller model capacity: The 6B small capacity determines its relatively weaker model memory and language ability. As the amount and number of rounds of self-training data increase, it will gradually lose its original conversational ability.

2. Weaker multi-turn dialogue capability: ChatGLM-6B’s context understanding ability is not yet fully developed. When faced with scenarios involving long answer generation and multi-turn dialogue, it may experience context loss and misunderstanding.

1.Installation

The currently open-sourced ChatGLM-6B is fine-tuned based on P-Tuning v2.

Link: link

Download ChatGLM-6B

git clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B
pip install -r requirements.txt
cd ptuning/
pip install rouge_chinese nltk jieba datasets
1.1 Using your own dataset

Modify train.sh and evaluate.sh’s train_file, validation_file, and test_file to your own JSON format dataset paths, and change prompt_column and response_column to the KEY corresponding to the input text and output text in the JSON file.

Sample data download link

Link: Dataset

Replace your dataset with the following format

{
“content”: “Type#top silhouette#loose silhouette#slimming pattern#lines top style#shirt sleeve type#puff sleeve top style#drawstring”,
“summary”: “This shirt style is very loose, and the neat lines can well hide small flaws on the figure, making it have a good slimming effect when worn. The neckline is decorated with a cute drawstring, and the beautiful knot shows a full personality, paired with a fashionable puff sleeve style, showing a sweet and cute feminine charm.”

}

2.Start training

bash train.sh

PRE_SEQ_LEN and LR in train.sh are the soft prompt length and training learning rate, respectively, which can be adjusted for the best results. The P-Tuning-v2 method freezes all model parameters. The quantization level of the original model can be adjusted by adjusting quantization_bit. Without this option, it is loaded with FP16 precision.

Under the default configuration quantization_bit=4, per_device_train_batch_size=1, gradient_accumulation_steps=16, the INT4 model parameters are frozen, and one training iteration will perform 16 accumulative forward and backward propagations with a batch size of 1, equivalent to a total batch size of 16. At this time, the minimum VRAM required is only 6.7G. If you want to improve training efficiency under the same batch size, you can increase the value of per_device_train_batch_size while keeping the product of the two unchanged, but this will also lead to more VRAM consumption. Please adjust according to the actual situation.

PRE_SEQ_LEN=128
LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file di/train.json \ # Training file address

--validation_file di/fval.json \ # Validation file address

--prompt_column content \ # Prompt name in the training set

--response_column summary \ # Answer details in the training set

--overwrite_cache \ # Can be deleted when retraining a training set

--model_name_or_path THUDM/chatglm-6b \ # Model file address for loading, can be changed to a local path

--output_dir output/adgen-chatglm-6b-pt-LR \ # Address for saving the trained model files

--overwrite_output_dir
--max_source_length 64 \ # Maximum length of input text

--max_target_length 64
--per_device_train_batch_size 4 \ # batch_size adjusted according to VRAM

--per_device_eval_batch_size 4
--gradient_accumulation_steps 8 \
--predict_with_generate
--max_steps 2000 \ # Maximum number of steps to save the model

--logging_steps 10 \ # Logging interval

--save_steps 500 \ # How many steps to save the model once

--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4 # Can be changed to int8

3.Validate the model

Change CHECKPOINT in evaluate.sh to the checkpoint name saved during training, and run the following command for model inference and evaluation:

bash evaluate.sh

4.Model deployment

3.1 Self-validation, change model path

Change THUDM/chatglm-6b in the corresponding demo or code to the address of the checkpoint after P-Tuning fine-tuning (in the example, it is ./output/adgen-chatglm-6b-pt-8-1e-2/checkpoint-3000). Note that the current fine-tuning does not support multi-turn data, so only the first turn of the dialogue response is fine-tuned.

During P-tuning v2 training, only the parameters of the PrefixEncoder are saved. Therefore, during inference, both the original ChatGLM-6B model and the PrefixEncoder weights need to be loaded. Therefore, parameters in evaluate.sh need to be specified:

It remains compatible with old checkpoints saved with all parameters; simply set model_name_or_path as before.

config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
if k.startswith("transformer.prefix_encoder."):
new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

Comment out the following line if you don't use quantization

model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

response, history = model.chat(tokenizer, "hello", history=[])

5.Note: The pre-trained model is typically stored locally in the following directory:

.cache/huggingface/hub/models–THUDM–chatglm-6b/snapshots/aa51e62ddc9c9f334858b0af44cf59b05c70148a/
Ensure this directory contains the essential files:

config.json configuration_chatglm.py modeling_chatglm.py pytorch_model.bin quantization.py

In the demo.py file, replace the default path THUDM/chatglm-6b with the path to your specific model location.

返回列表