Pchanumolu / huge-context-size-test

CodeLlama-2-20k: A Llama 2 Version of CodeAlpaca

This dataset is the pchanumolu/huge-context-size-test dataset with the Llama 2 prompt format described here.

Here is the code I used to format it:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset('pchanumolu/huge-context-size-test')

# Define a function to merge the three columns into one
def merge_columns(example):
    if example['input']:
        merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} Input: {example['input']} [/INST] {example['output']} </s>"
    else:
        merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} [/INST] {example['output']} </s>"
    return {"text": merged}

# Apply the function to all elements in the dataset
dataset = dataset.map(merge_columns, remove_columns=['instruction', 'input', 'output'])