Baebee / guanaco-extended

isic · August 30, 2023, 7:57am

Hugging Face Dataset Card: Amoeba Mixed AI-Human Generated Samples

Overview

Amoeba Mixed AI-Human Generated Samples is a massive dataset that contains a diverse collection of text samples generated by both AI models and human authors. With a size exceeding 13 GB, this dataset is designed to foster research and development in the field of natural language generation and understanding.

Intended Use

The Amoeba Mixed AI-Human Generated Samples dataset is intended for various natural language processing (NLP) tasks, including but not limited to:

Text generation
Language modeling
Text classification
Sentiment analysis
Language translation
Text summarization

Data Sources

The dataset comprises a blend of AI-generated samples from the state-of-the-art language model, “Amoeba,” and manually curated human-generated samples from diverse sources. By combining AI and human contributions, the dataset ensures a rich and varied distribution of language patterns and styles.

Data Format

The data is provided in plain text format, with one sample per line. Each sample represents a unique text snippet that can range from a few words to full sentences.