baichuan-inc/Baichuan-13B-Chat

介紹
Baichuan-13B-Chat為Baichuan-13B系列模型中對齊後的版本,預訓練模型可見Baichuan-13B-Base。

Baichuan-13B 是由百川智能繼 Baichuan-7B 之後開發的包含 130 億參數的開源可商用的大規模語言模型,在權威的中文和英文 benchmark 上均取得同尺寸最好的效果。本次發布包含有預訓練 (Baichuan-13B-Base) 和對齊 (Baichuan-13B-Chat) 兩個版本。Baichuan-13B 有如下幾個特點:

更大尺寸、更多數據:Baichuan-13B 在 Baichuan-7B 的基礎上進一步擴大參數量到 130 億,並且在高質量的語料上訓練了 1.4 萬億 tokens,超過 LLaMA-13B 40%,是當前開源 13B 尺寸下訓練數據量最多的模型。支持中英雙語,使用 ALiBi 位置編碼,上下文窗口長度為 4096。
同時開源預訓練和對齊模型:預訓練模型是適用開發者的「基座」,而廣大普通用戶對有對話功能的對齊模型具有更強的需求。因此本次開源我們同時發布了對齊模型(Baichuan-13B-Chat),具有很強的對話能力,開箱即用,幾行代碼即可簡單的部署。
更高效的推理:為了支持更廣大用戶的使用,我們本次同時開源了 int8 和 int4 的量化版本,相對非量化版本在幾乎沒有效果損失的情況下大大降低了部署的機器資源門檻,可以部署在如 Nvidia 3090 這樣的消費級顯卡上。
開源免費可商用:Baichuan-13B 不僅對學術研究完全開放,開發者也僅需郵件申請並獲得官方商用許可後,即可以免費商用。
Baichuan-13B-Chat is the aligned version in the Baichuan-13B series of models, and the pre-trained model can be found at Baichuan-13B-Base.

Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following Baichuan-7B. With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:

Larger size, more data: Baichuan-13B further expands the parameter volume to 13 billion based on Baichuan-7B, and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
Open-source pre-training and alignment models simultaneously: The pre-training model is a “base” suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
More efficient inference: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the Nvidia 3090 with almost no performance loss.
Open-source, free, and commercially usable: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.
使用方式
如下是一個使用Baichuan-13B-Chat進行對話的示例,正確輸出為"喬戈裏峰。世界第二高峰———喬戈裏峰西方登山者稱其為k2峰,海拔高度是8611米,位於喀喇昆侖山脈的中巴邊境上"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, device_map=“auto”, torch_dtype=torch.float16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”)
messages =
messages.append({“role”: “user”, “content”: “世界上第二高的山峰是哪座”})
response = model.chat(tokenizer, messages)
print(response)

Here is an example of a conversation using Baichuan-13B-Chat, the correct output is “K2. The world’s second highest peak - K2, also known as Mount Godwin-Austen or Chhogori, with an altitude of 8611 meters, is located on the China-Pakistan border in the Karakoram Range.”

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, device_map=“auto”, torch_dtype=torch.float16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”)
messages =
messages.append({“role”: “user”, “content”: “Which moutain is the second highest one in the world?”})
response = model.chat(tokenizer, messages)
print(response)

量化部署
Baichuan-13B 支持 int8 和 int4 量化,用戶只需在推理代碼中簡單修改兩行即可實現。請註意,如果是為了節省顯存而進行量化,應加載原始精度模型到 CPU 後再開始量化;避免在 from_pretrained 時添加 device_map=‘auto’ 或者其它會導致把原始精度模型直接加載到 GPU 的行為的參數。

Baichuan-13B supports int8 and int4 quantization, users only need to make a simple two-line change in the inference code to implement it. Please note, if quantization is done to save GPU memory, the original precision model should be loaded onto the CPU before starting quantization. Avoid adding parameters such as device_map=‘auto’ or others that could cause the original precision model to be loaded directly onto the GPU when executing from_pretrained.

使用 int8 量化 (To use int8 quantization):

model = AutoModelForCausalLM.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()

同樣的,如需使用 int4 量化 (Similarly, to use int4 quantization):

model = AutoModelForCausalLM.from_pretrained(“baichuan-inc/Baichuan-13B-Chat”, torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

模型詳情
模型描述
Developed by: 百川智能(Baichuan Intelligent Technology)

Email: [email protected]

Language(s) (NLP): Chinese/English

License: 【Community License for Baichuan-13B Model】(ZH| EN)

商業用途(For commercial use): 請通過 Email 聯系申請書面授權。(Contact us via Email above to apply for written authorization.)

模型結構
整體模型基於Baichuan-7B,為了獲得更好的推理性能,Baichuan-13B 使用了 ALiBi 線性偏置技術,相對於 Rotary Embedding 計算量更小,對推理性能有顯著提升;與標準的 LLaMA-13B 相比,生成 2000 個 tokens 的平均推理速度 (tokens/s),實測提升 31.6%:

Model tokens/s
LLaMA-13B 19.4
Baichuan-13B 25.4
具體參數和見下表

模型名稱 隱含層維度 層數 頭數 詞表大小 總參數量 訓練數據(tokens) 位置編碼 最大長度
Baichuan-7B 4,096 32 32 64,000 7,000,559,616 1.2萬億 RoPE 4,096
Baichuan-13B 5,120 40 40 64,000 13,264,901,120 1.4萬億 ALiBi 4,096
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:

Model tokens/s
LLaMA-13B 19.4
Baichuan-13B 25.4
The specific parameters are as follows:

Model Name Hidden Size Num Layers Num Attention Heads Vocab Size Total Params Training Dats(tokens) Position Embedding Max Length
Baichuan-7B 4,096 32 32 64,000 7,000,559,616 1.2萬億 RoPE 4,096
Baichuan-13B 5,120 40 40 64,000 13,264,901,120 1.4萬億 ALiBi 4,096
使用須知
免責聲明
我們在此聲明,我們的開發團隊並未基於 Baichuan-13B 模型開發任何應用,無論是在 iOS、Android、網頁或任何其他平臺。我們強烈呼籲所有使用者,不要利用 Baichuan-13B 模型進行任何危害國家社會安全或違法的活動。另外,我們也要求使用者不要將 Baichuan-13B 模型用於未經適當安全審查和備案的互聯網服務。我們希望所有的使用者都能遵守這個原則,確保科技的發展能在規範和合法的環境下進行。

我們已經盡我們所能,來確保模型訓練過程中使用的數據的合規性。然而,盡管我們已經做出了巨大的努力,但由於模型和數據的復雜性,仍有可能存在一些無法預見的問題。因此,如果由於使用 Baichuan-13B 開源模型而導致的任何問題,包括但不限於數據安全問題、公共輿論風險,或模型被誤導、濫用、傳播或不當利用所帶來的任何風險和問題,我們將不承擔任何責任。

We hereby declare that our development team has not developed any applications based on the Baichuan-13B model, whether on iOS, Android, the web, or any other platform. We strongly urge all users not to use the Baichuan-13B model for any activities that harm national social security or are illegal. In addition, we also ask users not to use the Baichuan-13B model for internet services that have not undergone appropriate security review and filing. We hope that all users will adhere to this principle to ensure that technological development takes place in a regulated and legal environment.

We have done our utmost to ensure the compliance of the data used in the model training process. However, despite our great efforts, due to the complexity of the model and data, there may still be some unforeseen issues. Therefore, we will not take any responsibility for any issues arising from the use of the Baichuan-13B open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, misused, disseminated, or improperly exploited.

訓練詳情
訓練具體設置參見Baichuan-13B。

For specific training settings, please refer to Baichuan-13B.

測評結果
C-Eval
Model 5-shot STEM Social Sciences Humanities Others Average
Baichuan-7B 38.2 52.0 46.2 39.3 42.8
Chinese-Alpaca-Plus-13B 35.2 45.6 40.0 38.2 38.8
Vicuna-13B 30.5 38.2 32.5 32.5 32.8
Chinese-LLaMA-Plus-13B 30.3 38.0 32.9 29.1 32.1
Ziya-LLaMA-13B-Pretrain 27.6 34.4 32.0 28.6 30.0
LLaMA-13B 27.0 33.6 27.7 27.6 28.5
moss-moon-003-base (16B) 27.0 29.1 27.2 26.9 27.4
Baichuan-13B-Base 45.9 63.5 57.2 49.3 52.4
Baichuan-13B-Chat 43.7 64.6 56.2 49.2 51.5
MMLU
Model 5-shot STEM Social Sciences Humanities Others Average
Vicuna-13B 40.4 60.5 49.5 58.4 52.0
LLaMA-13B 36.1 53.0 44.0 52.8 46.3
Chinese-Alpaca-Plus-13B 36.9 48.9 40.5 50.5 43.9
Ziya-LLaMA-13B-Pretrain 35.6 47.6 40.1 49.4 42.9
Baichuan-7B 35.6 48.9 38.4 48.1 42.3
Chinese-LLaMA-Plus-13B 33.1 42.8 37.0 44.6 39.2
moss-moon-003-base (16B) 22.4 22.8 24.2 24.4 23.6
Baichuan-13B-Base 41.6 60.9 47.4 58.5 51.6
Baichuan-13B-Chat 40.9 60.9 48.8 59.0 52.1
說明:我們采用了 MMLU 官方的評測方案。

CMMLU
Model 5-shot STEM Humanities Social Sciences Others China Specific Average
Baichuan-7B 34.4 47.5 47.6 46.6 44.3 44.0
Vicuna-13B 31.8 36.2 37.6 39.5 34.3 36.3
Chinese-Alpaca-Plus-13B 29.8 33.4 33.2 37.9 32.1 33.4
Chinese-LLaMA-Plus-13B 28.1 33.1 35.4 35.1 33.5 33.0
Ziya-LLaMA-13B-Pretrain 29.0 30.7 33.8 34.4 31.9 32.1
LLaMA-13B 29.2 30.8 31.6 33.0 30.5 31.2
moss-moon-003-base (16B) 27.2 30.4 28.8 32.6 28.7 29.6
Baichuan-13B-Base 41.7 61.1 59.8 59.0 56.4 55.3
Baichuan-13B-Chat 42.8 62.6 59.7 59.0 56.1 55.8
說明:CMMLU 是一個綜合性的中文評估基準,專門用於評估語言模型在中文語境下的知識和推理能力。我們采用了其官方的評測方案。

微信群組

WeChat