Ziya Jiang Series Models

Brief Introduction
Ziya-Writing-LLaMa-13B-v1 is a 13-billion parameter instruction fine-tuned model based on LLaMa, which has been enhanced for better performance in writing tasks. It is a large model that focuses on writing. Ziya-Writing-LLaMa-13B-v1 can handle several types of writing tasks, including official reports, speeches, creative copywriting, and more.
For more details, please refer to our official account article:
姜子牙大模型系列 | 写作模型ziya-writing开源!开箱即用,快来认领专属你的写作小助手吧
Software dependencies

pip install torch==1.12.1 tokenizers==0.13.3 git+

Model Information

Supervised finetuning

We collected and cleaned a large amount of real human writing data from the internet and used GPT-3.5 to generate corresponding writing instructions which have undergone extremely strict manual verification.

Based on this, we used a reward model and certain cleaning logic to carefully select more challenging writing instructions, eliminating simple data, and ensuring the diversity of instructions.

We used the evol-instruct method to generate about 300,000 high-quality general instruction data. We mixed general instruction data and writing instruction data, which made ziya-writing not only have good intention understanding ability, but also can generate excellent responses.
Human-Feedback training
In our experiment, we found that by using a small amount of high-quality human-annotated writing ranking data and training the model with reinforcement learning, we could effectively improve the writing performance of the model.

To further improve the performance of the model, enabling it to fully understand human intentions, reduce “hallucinations” and unsafe outputs, we conducted Human-Feedback Training (HFT) based on the model fine-tuned with instructions. In the training process, we used human feedback reinforcement learning (RM, PPO).

We implemented the HFT training process on an internally developed framework, which can use a minimum of 8 40GB A100 GPUs to complete the full parameter training of Ziya-Writing-LLaMA-13B-v1. In the PPO training, we did not limit the length of the generated samples to ensure the accuracy of rewards for long-text tasks. The total experience pool size for each training exceeded 100k samples, ensuring the sufficiency of the training.

The evaluation of the quality of a writing task is quite subjective, making it difficult to measure with precise accuracy or satisfaction score. Therefore, we’ve used an anonymous multi-person Side-by-Side evaluation mechanism, and have collected 100 pieces of writing instruction data of different difficulties for evaluation. We will also make this evaluation set public in the future.

We use the win rate as an indicator of the quality of a model. The formula to calculate a model’s win rate is as follows:

Win Rate = (Number of wins for the model + Number of draws / 2) / Total number of annotations

Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.

Ziya-Writing-LLaMa-13B-v1 平均胜出率 最大胜出率 最小胜出率
vs Ziya-LLaMa-13B-v1.1 70.7 73.5 69
vs baichuan-vicuna-7b 69.6 73.5 68
vs Moss-16B 65.1 69 62
vs ChatGLM2-6B 58.3 61.5 56
vs Minimax-abab5 52.3 53 50.5
vs GPT-3.5-turbo 44.7 49.5 38
(Note: The maximum winning rate and the minimum winning rate are the individual statistics of the labeling results of each labeler, and the maximum and minimum scores are calculated; the average winning rate is the summary statistics of the labeling results of all labelers, and the average score is calculated. score.)
Due to licensing restrictions of LLaMA weights, this model cannot be used for commercial purposes, please strictly adhere to LLaMA’s usage policy.
from transformers import AutoTokenizer
from transformers import LlamaForCausalLM
import torch

device = torch.device("cuda")

model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1", use_fast=False)
inputs = '<human>:' + query.strip() + '\n<bot>:'
input_ids = tokenizer(inputs, return_tensors="pt")
generate_ids = model.generate(
            do_sample = True, 
            top_p = 0.85, 
            temperature = 0.85, 
output = tokenizer.batch_decode(generate_ids)[0]

Finetune Example

Refer to ziya_finetune

Inference & Quantization Example

Refer to ziya_inference

If you are using the resource for your work, please cite the our paper:

  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}

You can also cite our website: