This checkpoint is the primary CodeT5-based solver we used for the MindsAI @ Tufa Labs entry in the ARC Prize 2025 competition. It shares the same architecture as mindware/arc-codet5-660m-scr (a 16-layer decoder variant of Salesforce/codet5-large), but does not include the Span-Corruption Refinement (SCR) auxiliary training stage. Instead, it represents the best non-refinement checkpoint obtained during long-horizon pretraining on TPU-v4 systems.

No SCR stage: this model was trained purely with the original span-corruption + instruction fine-tuning curriculum + ARC fine tunining.
Decoder-only pruning: the original decoder depth (24) was reduced to 16 layers after experiments showed encoder pruning harmed sample efficiency, while decoder pruning could be recovered through extended training.
Long-run TPU training: training spanned roughly two years on a V4-64 TPU, made possible by Google’s TPU Research Cloud program.

📚 ARC-Related Datasets & Frameworks

RE-ARC — procedurally generates examples for the 400 ARC training tasks (we also include RE-ARC eval + ARC 1.5).
ConceptARC
1D-ARC
ARC_gym, Sort-of-ARC
Andreas Koepf’s generator suites (includes RE-ARC-style grids, code generation targets, and solution graphs).
Jack Cole’s custom generators covering ~70 tasks plus larger concept sets (cellular automata, math-derived boards, etc.).

Several auxiliary datasets predict task metadata (graphs, heuristics, explanations) rather than final boards; they are part of the broader instruction mixture this model saw during pretraining.

ARC Data Formatting

ARC tasks ship as JSON where each task_id contains train pairs and test inputs; every grid is a rectangular list of lists with integers 0-9. Dimensions follow the original 1×1–30×30 spec, though the evaluator accepts up to 50×50.

Example task payload:

{
  "task_id": {
    "train": [
      {"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]}
    ],
    "test": [
      {"input": [[0,0,0],[0,1,0],[0,0,0]]}
    ]
  }
}

Model prompts (prompt column during training/TTT/inference) are serialized text strings: solve: train input1 <train_input> output1 <prefix><train_output>. … test tinput1 <test_input> toutput1 . Each grid token <train_input> / <train_output> / <test_input> is produced by grid_to_string, so rows are concatenated digits separated by spaces. Multiple train examples increment the index (input2, output2, etc.).

Prompt example:

solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1

Model targets (correct_answer column and expected decoder output before post-processing) follow output_prefix semantics: {total_chars} {height} {width} {symbols} {row_strings}. Here total_chars = height*width + (height - 1) and symbols is the deduplicated sequence of colors as they are first encountered when scanning the board row-major; that rule applies to every output grid we emit (training outputs inside the prompt and the predicted test toutput). Example target string for a 3×3 donut:
```
 11 3 3 10 111 101 111.
```

Downloads last month: 38

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mindware/arc-codet5-660m

Base model

Salesforce/codet5-large

Finetuned

(4)

this model

mindware
/

arc-codet5-660m

ARC Data Formatting

Model tree for mindware/arc-codet5-660m

Datasets used to train mindware/arc-codet5-660m