Financial-Advice-LLM: A fine-tuned Llama 3.1 Model for Structured Budgeting and Financial Advice

Introduction

This project fine-tunes Llama-3.1-8B-Instruct to generate structured, personalized financial advice based on a user's financial profile. While general-purpose LLMs often produce inconsistent or unstructured financial recommendations, this model addresses that gap by training on a large synthetic dataset designed to enforce format, clarity, and completeness. The goal is to help users better interpret their financial situations and receive actionable, template-aligned guidance. After fine-tuning, the model demonstrated improved structure and completeness in budgeting tasks while retaining competitive general reasoning performance.

Data

Because no public dataset exists for personalized budgeting guidance, I created a synthetic dataset of over 300 examples using few-shot prompting. Each example includes a structured instruction (income, debt, family size, goals, etc) paired with a detailed budgeting response covering:

Budget Overview
Savings Recommendations
Debt Strategy
Additional Tips

The dataset follows a consistent Instruction / Response format and a 90/10 train and validation split was used to prepare the data.

For evaluation, I used:

GSM8K-COT for math and numerical reasoning
MMLU High School Math for general reasoning
TruthfulQA (MC1) for factual robustness
A custon LLM-as-a-judge dataset to evaluate structure, completeness, and accuracy of budgeting outputs

These benchmarks collectively measure both general reasoning and task-specific financial performance.

Methodology

Based on experimentation through homework assignments, I selected full fine-tuning because lightweight approaches were insufficient for enforcing consistent multi-section templates. Full fine-tuning allows deeper alignment to structural patterns found in the synthetic dataset.

Three hyperparameter configurations were tested:

Config	Learning Rate	Epochs	Notes
1	5e-6	1	Underfit
2	1e-5	2	Best validation performance
3	2e-5	2	Slight overfitting

The final model was selected from Configuration 2, which showed the lowest validation loss and most stable learning.

Evaluation

Benchmark Results

Model	MMLU Math	GSM8K Strict	GSM8K Flexible	TruthfulQA MC1	Structure	Completeness	Accuracy
Base Llama-3.1-8B	42.2%	76.4%	77.6%	36.96%	4.5	3.9	4.5
Financial-Advice-LLM (Our Fine-tuned Model)	40.7%	73.0%	75.1%	37.70%	3.67	4.0	4.56
Mistral-7B	~37%	~66%	~68%	N/A	N/A	N/A	N/A
Gemma-7B	~39%	~70%	~72%	N/A	N/A	N/A	N/A

Benchmark Task Descriptions

I selected these benchmarks to test both general purpose and domain-specific reasoning:

GSM8k: Makes surer the model still performs well on multi-step numerical reasoning tasks, relevant for budgeting math
MMLU High School Math: Checks general reasoning retention after specializing
TruthfulQA (MC1): Assesses factual robustness and resistance to misleading questions
LLM-as-a-Judge: Measures template adherence, accuracy, and completeness

Summary of Performance

As expected, the model shows a slight decrease in general reasoning benchmarks because of specialization for budgeting style responses. To add on, our third benchmark, TruthfulQA improved modestly, showing that the fine-tuning did not harm general factual robustness. However, the LLM-as-a-judge scores improved in completeness and accuracy and stayed consistent in structure, meaning the fine-tuned model is better aligned for its intended task.

Usage and Intended Uses

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "db5kb/financial-advice-llm-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = (
    "Instruction: I earn $85k, have $10k in student loans, and want to save for a home.\n\n"
    "Response:"
)

pipe(prompt, max_new_tokens=350)

Intended Use Cases

Personalized budgeting and financial advice summaries
Beginner-friendly explanations of financial principles
Consistent template-based financial guidance

Not meant for investement decisions, tax planning, or high-stakes financial advice

Prompt Format

The model expects inputs formatted as:

Instruction: <user financial profile>

Response:

Example:

Instruction: I'm a single professional earning $75k with $20k in student loans and want to save for a home.

Response:

Expected Output Format

The model outputs a structured, multi-section budgeting summary:

1. Budget Overview:
<text>

2. Savings Recommendations:
<text>

3. Debt Strategy:
<text>

4. Additional Tips:
<text>

Limitations

The dataset is fully synthetic and may miss some real-world scenarios and variability
The model may oversimplify complex financial situations
There are slight performance decreases on general reasoning benchmarks
Evaluation through an LLM judge may introduce bias
It should not be used for financial decision making

Downloads last month: 39

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for db5kb/financial-advice-llm-Llama-3.1-8B-Instruct

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct