BlinkDL/rwkv-4-pileplus

RWKV-4 PilePlus

Model Description

RWKV-4-pile models finetuning on [RedPajama + some of Pile v2 = 1.7T tokens]. Updated with 2020+2021+2022 data, and better at all European languages.

Although some of these are intermedia checkpoints (XXXGtokens means finetuned for XXXG tokens), you can already use them because I am finetuning from Pile models (instead of retraining).

Note: not instruct tuned yet, and recommended to replace vanilla Pile models.

7B and 14B coming soon.

See GitHub - BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. for details.

Use GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. to run it.