Skip to main content
Different training methods require specific dataset formats. For the complete reference, see the TRL Dataset Formats documentation.

Dataset Sources & File Types

Hugging Face Datasets is a great place to find pre-built datasets for fine-tuning. Most dataset loaders also support local files in these formats:
  • JSONL - One JSON object per line, easiest to create
  • CSV - Tabular format, good for simple datasets
  • Parquet/Arrow - More efficient for larger datasets
We’ve curated a number of high quality, popular instruction and preference datasets here.

Text Datasets

Instruction Datasets (SFT)

Conversational format with a messages array:
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
  ]
}
Roles: system (optional), user, assistant. Multi-turn conversations are supported by alternating user/assistant messages. Example: HuggingFaceTB/smoltalk You may encounter datasets in other formats (standard prompt-completion or conversational prompt-completion). Convert these to the messages format with role and content fields to ensure reliable generations.

Preference Datasets (DPO)

Chosen and rejected completions for the same prompt. Use the explicit format which separates the prompt from each answer:
{
  "prompt": [{"role": "user", "content": "What is 2+2?"}],
  "chosen": [{"role": "assistant", "content": "2+2 equals 4."}],
  "rejected": [{"role": "assistant", "content": "2+2 equals 5."}]
}
Example: mlabonne/orpo-dpo-mix-40k Preference datasets also exist in an implicit format where the prompt is embedded in both chosen and rejected. The explicit format is recommended—convert implicit datasets before training. The DPOTrainer will automatically convert implicit to explicit if needed.

Prompt-Only Datasets (GRPO)

For reinforcement learning methods like GRPO, only prompts are provided. Completions are generated during training and evaluated by reward functions:
{
  "prompt": [
    {"role": "system", "content": "Solve the math problem step by step."},
    {"role": "user", "content": "What is 15 * 23?"}
  ]
}
Example: AI-MO/NuminaMath-TIR

Vision Datasets

Vision Datasets (VLM-SFT)

For vision-language models, content uses typed arrays with a separate images column:
{
  "messages": [
    {"role": "user", "content": [
      {"type": "image"},
      {"type": "text", "text": "What is in this image?"}
    ]},
    {"role": "assistant", "content": [{"type": "text", "text": "A cat sitting on a couch."}]}
  ],
  "images": ["<PIL.Image in RGB>"]
}
Images must be RGB format. The {"type": "image"} placeholder indicates where the image appears in the conversation. Example: HuggingFaceH4/llava-instruct-mix-vsft
You can map a preprocessing function like this to your dataset to load and prepare images for training:
from PIL import Image
import requests
from io import BytesIO

def load_image(sample):
    # Load from file
    sample["image"] = Image.open(sample["image_path"]).convert("RGB")
    # Or load from URL
    # response = requests.get(sample["image_url"])
    # sample["image"] = Image.open(BytesIO(response.content)).convert("RGB")
    return sample

dataset = dataset.map(load_image)