Hugging Face Dataset Card: Amoeba Mixed AI-Human Generated Samples
Overview
Amoeba Mixed AI-Human Generated Samples is a massive dataset that contains a diverse collection of text samples generated by both AI models and human authors. With a size exceeding 13 GB, this dataset is designed to foster research and development in the field of natural language generation and understanding.
Intended Use
The Amoeba Mixed AI-Human Generated Samples dataset is intended for various natural language processing (NLP) tasks, including but not limited to:
- Text generation
- Language modeling
- Text classification
- Sentiment analysis
- Language translation
- Text summarization
Data Sources
The dataset comprises a blend of AI-generated samples from the state-of-the-art language model, “Amoeba,” and manually curated human-generated samples from diverse sources. By combining AI and human contributions, the dataset ensures a rich and varied distribution of language patterns and styles.
Data Format
The data is provided in plain text format, with one sample per line. Each sample represents a unique text snippet that can range from a few words to full sentences.