Nampdn-ai / tiny-lessons

Tiny Lessons

The dataset is designed to help causal language models learn more effectively from raw web text. It is augmented from public web text and contains two key components: theoretical concepts and practical examples.

The theoretical concepts provide a foundation for understanding the underlying principles and ideas behind the information contained in the raw web text. The practical examples demonstrate how these theoretical concepts can be applied in real-world situations.

This dataset is an ideal resource for ML researchers working with causal language models. I hope you find it useful and welcome any feedback or suggestions you may have.

View Nomic Atlas