Financial-Advice-LLM: A fine-tuned Llama 3.1 Model for Structured Budgeting and Financial Advice

Introduction

This project fine-tunes Llama-3.1-8B-Instruct to generate structured, personalized financial advice based on a user's financial profile. While general-purpose LLMs often produce inconsistent or unstructured financial recommendations, this model addresses that gap by training on a large synthetic dataset designed to enforce format, clarity, and completeness. The goal is to help users better interpret their financial situations and receive actionable, template-aligned guidance. After fine-tuning, the model demonstrated improved structure and completeness in budgeting tasks while retaining competitive general reasoning performance.

Data

Because no public dataset exists for personalized budgeting guidance, I created a synthetic dataset of over 300 examples using few-shot prompting. Each example includes a structured instruction (income, debt, family size, goals, etc) paired with a detailed budgeting response covering:

  1. Budget Overview
  2. Savings Recommendations
  3. Debt Strategy
  4. Additional Tips

The dataset follows a consistent Instruction / Response format and a 90/10 train and validation split was used to prepare the data.

For evaluation, I used:

  • GSM8K-COT for math and numerical reasoning
  • MMLU High School Math for general reasoning
  • TruthfulQA (MC1) for factual robustness
  • A custon LLM-as-a-judge dataset to evaluate structure, completeness, and accuracy of budgeting outputs

These benchmarks collectively measure both general reasoning and task-specific financial performance.

Methodology

Based on experimentation through homework assignments, I selected full fine-tuning because lightweight approaches were insufficient for enforcing consistent multi-section templates. Full fine-tuning allows deeper alignment to structural patterns found in the synthetic dataset.

Three hyperparameter configurations were tested:

Config Learning Rate Epochs Notes
1 5e-6 1 Underfit
2 1e-5 2 Best validation performance
3 2e-5 2 Slight overfitting

The final model was selected from Configuration 2, which showed the lowest validation loss and most stable learning.

Evaluation

Benchmark Results

Model MMLU Math GSM8K Strict GSM8K Flexible TruthfulQA MC1 Structure Completeness Accuracy
Base Llama-3.1-8B 42.2% 76.4% 77.6% 36.96% 4.5 3.9 4.5
Financial-Advice-LLM (Our Fine-tuned Model) 40.7% 73.0% 75.1% 37.70% 3.67 4.0 4.56
Mistral-7B ~37% ~66% ~68% N/A N/A N/A N/A
Gemma-7B ~39% ~70% ~72% N/A N/A N/A N/A

Benchmark Task Descriptions

I selected these benchmarks to test both general purpose and domain-specific reasoning:

  • GSM8k: Makes surer the model still performs well on multi-step numerical reasoning tasks, relevant for budgeting math
  • MMLU High School Math: Checks general reasoning retention after specializing
  • TruthfulQA (MC1): Assesses factual robustness and resistance to misleading questions
  • LLM-as-a-Judge: Measures template adherence, accuracy, and completeness

Summary of Performance

As expected, the model shows a slight decrease in general reasoning benchmarks because of specialization for budgeting style responses. To add on, our third benchmark, TruthfulQA improved modestly, showing that the fine-tuning did not harm general factual robustness. However, the LLM-as-a-judge scores improved in completeness and accuracy and stayed consistent in structure, meaning the fine-tuned model is better aligned for its intended task.

Usage and Intended Uses

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "db5kb/financial-advice-llm-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = (
    "Instruction: I earn $85k, have $10k in student loans, and want to save for a home.\n\n"
    "Response:"
)

pipe(prompt, max_new_tokens=350)

Intended Use Cases

  • Personalized budgeting and financial advice summaries
  • Beginner-friendly explanations of financial principles
  • Consistent template-based financial guidance

Not meant for investement decisions, tax planning, or high-stakes financial advice

Prompt Format

The model expects inputs formatted as:

Instruction: <user financial profile>

Response:

Example:

Instruction: I'm a single professional earning $75k with $20k in student loans and want to save for a home.

Response:

Expected Output Format

The model outputs a structured, multi-section budgeting summary:

1. Budget Overview:
<text>

2. Savings Recommendations:
<text>

3. Debt Strategy:
<text>

4. Additional Tips:
<text>

Limitations

  • The dataset is fully synthetic and may miss some real-world scenarios and variability
  • The model may oversimplify complex financial situations
  • There are slight performance decreases on general reasoning benchmarks
  • Evaluation through an LLM judge may introduce bias
  • It should not be used for financial decision making
Downloads last month
39
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for db5kb/financial-advice-llm-Llama-3.1-8B-Instruct

Finetuned
(2056)
this model

Datasets used to train db5kb/financial-advice-llm-Llama-3.1-8B-Instruct