Introduction
ChatGLM-6B is an open-source text generation dialogue model based on the General Language Model (GLM) framework, with 6.2 billion parameters. By combining model distillation technology, it is measured to occupy approximately 6G of VRAM during training on an RTX 2080Ti graphics card.
1. Lower deployment threshold: Under FP16 half-precision, ChatGLM-6B requires at least 13GB of VRAM for inference. By combining model quantization technology, this requirement can be further lowered to 10GB (INT8) and 6GB (INT4), enabling deployment of ChatGLM-6B on consumer-grade graphics cards.
2. Longer sequence length: Compared to GLM-10B (sequence length 1024), ChatGLM-6B has a sequence length of 2048, supporting longer conversations and applications.3. Human intent alignment training: Supervised Fine-Tuning (Supervised Fine-Tuning), Feedback Bootstrap (Feedback Bootstrap), and Reinforcement Learning from Human Feedback (Reinforcement Learning from Human Feedback) methods are used to give the model a preliminary ability to understand human instruction intent. The output format is markdown, for easy display. The supervised fine-tuning method is now open source.
Disadvantages:1. Smaller model capacity: The 6B small capacity determines its relatively weaker model memory and language ability. As the amount and number of rounds of self-training data increase, it will gradually lose its original conversational ability.
2. Weaker multi-turn dialogue capability: ChatGLM-6B’s context understanding ability is not yet fully developed. When faced with scenarios involving long answer generation and multi-turn dialogue, it may experience context loss and misunderstanding.
1.Installation
The currently open-sourced ChatGLM-6B is fine-tuned based on P-Tuning v2.
Link: link
Download ChatGLM-6Bgit clone https://github.com/THUDM/ChatGLM-6B
cd ChatGLM-6B
pip install -r requirements.txt
cd ptuning/
pip install rouge_chinese nltk jieba datasets
1.1 Using your own dataset
Modify train.sh and evaluate.sh’s train_file, validation_file, and test_file to your own JSON format dataset paths, and change prompt_column and response_column to the KEY corresponding to the input text and output text in the JSON file.
Sample data download linkLink: Dataset
Replace your dataset with the following format{
“content”: “Type#top silhouette#loose silhouette#slimming pattern#lines top style#shirt sleeve type#puff sleeve top style#drawstring”,
“summary”: “This shirt style is very loose, and the neat lines can well hide small flaws on the figure, making it have a good slimming effect when worn. The neckline is decorated with a cute drawstring, and the beautiful knot shows a full personality, paired with a fashionable puff sleeve style, showing a sweet and cute feminine charm.”
}
2.Start training
bash train.sh
Under the default configuration quantization_bit=4, per_device_train_batch_size=1, gradient_accumulation_steps=16, the INT4 model parameters are frozen, and one training iteration will perform 16 accumulative forward and backward propagations with a batch size of 1, equivalent to a total batch size of 16. At this time, the minimum VRAM required is only 6.7G. If you want to improve training efficiency under the same batch size, you can increase the value of per_device_train_batch_size while keeping the product of the two unchanged, but this will also lead to more VRAM consumption. Please adjust according to the actual situation.
PRE_SEQ_LEN=128
LR=2e-2
CUDA_VISIBLE_DEVICES=0 python3 main.py
--do_train
--train_file di/train.json \ # Training file address
--validation_file di/fval.json \ # Validation file address
--prompt_column content \ # Prompt name in the training set
--response_column summary \ # Answer details in the training set
--overwrite_cache \ # Can be deleted when retraining a training set
--model_name_or_path THUDM/chatglm-6b \ # Model file address for loading, can be changed to a local path
--output_dir output/adgen-chatglm-6b-pt-LR \ # Address for saving the trained model files
--overwrite_output_dir
--max_source_length 64 \ # Maximum length of input text
--max_target_length 64
--per_device_train_batch_size 4 \ # batch_size adjusted according to VRAM
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8 \
--predict_with_generate
--max_steps 2000 \ # Maximum number of steps to save the model
--logging_steps 10 \ # Logging interval
--save_steps 500 \ # How many steps to save the model once
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4 # Can be changed to int8
3.Validate the model
Change CHECKPOINT in evaluate.sh to the checkpoint name saved during training, and run the following command for model inference and evaluation:
bash evaluate.sh
4.Model deployment
3.1 Self-validation, change model path
Change THUDM/chatglm-6b in the corresponding demo or code to the address of the checkpoint after P-Tuning fine-tuning (in the example, it is ./output/adgen-chatglm-6b-pt-8-1e-2/checkpoint-3000). Note that the current fine-tuning does not support multi-turn data, so only the first turn of the dialogue response is fine-tuned.
During P-tuning v2 training, only the parameters of the PrefixEncoder are saved. Therefore, during inference, both the original ChatGLM-6B model and the PrefixEncoder weights need to be loaded. Therefore, parameters in evaluate.sh need to be specified:
config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
if k.startswith("transformer.prefix_encoder."):
new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
5.Note: The pre-trained model is typically stored locally in the following directory:
.cache/huggingface/hub/models–THUDM–chatglm-6b/snapshots/aa51e62ddc9c9f334858b0af44cf59b05c70148a/
Ensure this directory contains the essential files:
config.json configuration_chatglm.py modeling_chatglm.py pytorch_model.bin quantization.py
demo.py
file, replace the default path THUDM/chatglm-6b
with the path to your specific model location.扫码关注不迷路!!!
郑州升龙商业广场B座25层
service@iqiqiqi.cn
联系电话:187-0363-0315
联系电话:199-3777-5101