A newer version of the Gradio SDK is available:
6.1.0
metadata
title: CoDA Fine-tuning
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
hf_oauth: true
hf_oauth_scopes:
- read-repos
- write-repos
CoDA Model Fine-tuning Space
This Space allows you to fine-tune the Salesforce/CoDA-v0-Instruct text generation diffusion model on the baseten-admin/gpt-oss120b-generated-perfectblend dataset.
Features
- π― Full Fine-tuning: Complete parameter fine-tuning (not LoRA)
- π¬ ChatML Format: Processes conversation data with question-answer pairs
- π Auto Upload: Automatically uploads trained model to your Hugging Face account
- π Progress Tracking: Real-time training progress updates
- π OAuth Integration: Secure authentication via Hugging Face login
How to Use
- Login: Click the "Sign in with Hugging Face" button
- Configure: Adjust training parameters (epochs, batch size, learning rate)
- Train: Click "Start Training" (requires GPU - upgrade Space to GPU tier)
- Resume: If training is interrupted, check "Resume from last checkpoint" and restart
- Upload: After training completes, click "Upload to Hugging Face Hub"
Persistence
This Space supports checkpoint persistence:
- Training checkpoints are saved every 500 steps
- If interrupted, you can resume from the last checkpoint
- For Docker deployment: Mount
/datavolume for full persistence - On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier
Requirements
- Hardware: GPU (T4, A10G, or better) strongly recommended
- Account: Hugging Face account with write permissions
- Time: Training takes several hours depending on configuration
About the Model
CoDA (Code Diffusion with Autoregressive) is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data.
Model Configuration
{
"architectures": ["CoDALanguageModel"],
"hidden_size": 2048,
"num_hidden_layers": 28,
"num_attention_heads": 16,
"vocab_size": 151936,
"max_position_embeddings": 40960
}
Dataset
The training uses the baseten-admin/gpt-oss120b-generated-perfectblend dataset:
- Format: Conversational data in ChatML format
- Column:
conversations(list of role-content pairs) - Split: Uses
trainsplit with 90/10 train/eval split
Training Details
- Optimizer: AdamW
- Precision: FP16 (on GPU)
- Gradient Accumulation: 4 steps
- Gradient Checkpointing: Enabled for memory efficiency
- Max Sequence Length: 2048 tokens
Citation
If you use this Space or the CoDA model, please cite:
@article{coda2023,
title={CoDA: Bidirectional Code Diffusion},
author={Salesforce AI Research},
journal={arXiv preprint},
year={2023}
}
License
Apache 2.0