Spaces:
Sleeping
Sleeping
sethmcknight
commited on
Commit
·
2d9ce15
1
Parent(s):
92c00a3
Add initial project files including README, .gitignore, and project documentation
Browse files- .gitignore +11 -0
- README.md +41 -1
- copilot-instructions.md +60 -0
- deployed.md +3 -0
- design-and-evaluation.md +3 -0
- project-plan.md +83 -0
- project-prompt-and-rubric.md +228 -0
- requirements.txt +2 -0
.gitignore
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.pyc
|
| 4 |
+
*.pyo
|
| 5 |
+
*.pyd
|
| 6 |
+
.Python
|
| 7 |
+
env/
|
| 8 |
+
venv/
|
| 9 |
+
ENV/
|
| 10 |
+
env.bak/
|
| 11 |
+
venv.bak/
|
README.md
CHANGED
|
@@ -1,2 +1,42 @@
|
|
| 1 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
Repo for the Quantic MSSE AI Engineering project code
|
|
|
|
| 1 |
+
# # MSSE AI Engineering Project
|
| 2 |
+
|
| 3 |
+
This project is a Retrieval-Augmented Generation (RAG) application that answers questions about a corpus of company policies.
|
| 4 |
+
|
| 5 |
+
## Setup
|
| 6 |
+
|
| 7 |
+
1. Clone the repository:
|
| 8 |
+
|
| 9 |
+
```bash
|
| 10 |
+
git clone https://github.com/sethmcknight/msse-ai-engineering.git
|
| 11 |
+
cd msse-ai-engineering
|
| 12 |
+
```
|
| 13 |
+
|
| 14 |
+
2. Create and activate a virtual environment:
|
| 15 |
+
|
| 16 |
+
```bash
|
| 17 |
+
python3 -m venv venv
|
| 18 |
+
source venv/bin/activate
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
3. Install the dependencies:
|
| 22 |
+
```bash
|
| 23 |
+
pip install -r requirements.txt
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Running the Application
|
| 27 |
+
|
| 28 |
+
To run the Flask application:
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
flask run
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Running Tests
|
| 35 |
+
|
| 36 |
+
To run the test suite:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
pytest
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
Repo for the Quantic MSSE AI Engineering project code
|
copilot-instructions.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copilot Instructions
|
| 2 |
+
|
| 3 |
+
This document outlines the guiding principles and directives for the GitHub Copilot assistant for the duration of this project. The primary objective is to successfully build, evaluate, and deploy a Retrieval-Augmented Generation (RAG) application in accordance with the `project-prompt-and-rubric.md` and the `project-plan.md`.
|
| 4 |
+
|
| 5 |
+
## Core Mission
|
| 6 |
+
|
| 7 |
+
Your primary goal is to assist in developing a RAG application that meets all requirements for a grade of 5. You must adhere to the development plan, follow best practices, and proactively contribute to the project's success.
|
| 8 |
+
|
| 9 |
+
## Guiding Principles
|
| 10 |
+
|
| 11 |
+
1. **Plan-Driven Development:** Always refer to `project-plan.md` as the source of truth for the current task and overall workflow. Do not deviate from the plan without explicit instruction.
|
| 12 |
+
2. **Test-Driven Development (TDD):** This is a strict requirement. For every new feature or piece of logic, you must first write the failing tests using `pytest` and then implement the code to make the tests pass.
|
| 13 |
+
3. **Continuous Integration/Continuous Deployment (CI/CD):** The project prioritizes early and continuous deployment. All changes must pass the CI/CD pipeline (install, test, build) before being merged into the `main` branch.
|
| 14 |
+
4. **Rubric-Focused:** All development choices should be justifiable against the `project-prompt-and-rubric.md`. This includes technology choices, implementation details, and evaluation metrics.
|
| 15 |
+
5. **Reproducibility:** Ensure the application is reproducible by managing dependencies in `requirements.txt` and setting fixed seeds where applicable (e.g., chunking, evaluation).
|
| 16 |
+
|
| 17 |
+
## Technical Stack & Constraints
|
| 18 |
+
|
| 19 |
+
- **Language:** Python
|
| 20 |
+
- **Web Framework:** Flask
|
| 21 |
+
- **Testing:** `pytest`
|
| 22 |
+
- **Vector Database:** ChromaDB (local)
|
| 23 |
+
- **Embedding & LLM APIs:** Use free-tier services (e.g., OpenRouter, Groq, HuggingFace).
|
| 24 |
+
- **Deployment:** Render
|
| 25 |
+
- **CI/CD:** GitHub Actions
|
| 26 |
+
|
| 27 |
+
## Step-by-Step Workflow
|
| 28 |
+
|
| 29 |
+
You must follow the sequence laid out in `project-plan.md`. The key phases are:
|
| 30 |
+
|
| 31 |
+
1. **Project Setup:** Initialize the repository, virtual environment, and placeholder files.
|
| 32 |
+
2. **"Hello World" Deployment:** Create a minimal Flask app with a `/health` endpoint and deploy it to Render via the initial CI/CD pipeline. This is a critical first milestone.
|
| 33 |
+
3. **TDD Cycles:** For all subsequent features (data ingestion, embedding, RAG, web UI):
|
| 34 |
+
- Write tests.
|
| 35 |
+
- Implement the feature.
|
| 36 |
+
- Run tests locally.
|
| 37 |
+
- Commit and push to trigger the CI/CD pipeline.
|
| 38 |
+
- Verify deployment.
|
| 39 |
+
|
| 40 |
+
## Key Application Requirements
|
| 41 |
+
|
| 42 |
+
- **Endpoints:**
|
| 43 |
+
- `/`: Web chat interface.
|
| 44 |
+
- `/chat`: API for questions (POST) and answers (JSON with citations).
|
| 45 |
+
- `/health`: Simple JSON status.
|
| 46 |
+
- **Guardrails (Must be tested):**
|
| 47 |
+
- Refuse to answer questions outside the provided corpus.
|
| 48 |
+
- Limit output length.
|
| 49 |
+
- Always cite sources for every answer.
|
| 50 |
+
- **Documentation:**
|
| 51 |
+
- Keep `README.md` updated with setup and run instructions.
|
| 52 |
+
- Incrementally populate `design-and-evaluation.md` as decisions are made and results are gathered.
|
| 53 |
+
- Ensure `deployed.md` always contains the correct public URL.
|
| 54 |
+
|
| 55 |
+
## Your Role
|
| 56 |
+
|
| 57 |
+
- **Implementer:** Write code, create files, and configure services based on my requests.
|
| 58 |
+
- **Tester:** Write `pytest` tests for all functionality.
|
| 59 |
+
- **Reviewer:** Proactively identify potential issues, suggest improvements, and ensure code quality.
|
| 60 |
+
- **Navigator:** Keep track of the current step in the `project-plan.md` and be ready to proceed to the next one.
|
deployed.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deployed Application
|
| 2 |
+
|
| 3 |
+
The application is not yet deployed.
|
design-and-evaluation.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Design and Evaluation
|
| 2 |
+
|
| 3 |
+
This document will be updated with design choices and evaluation results as the project progresses.
|
project-plan.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# RAG Application Project Plan
|
| 2 |
+
|
| 3 |
+
This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.
|
| 4 |
+
|
| 5 |
+
## 1. Foundational Setup
|
| 6 |
+
|
| 7 |
+
- [x] **Repository:** Create a new GitHub repository.
|
| 8 |
+
- [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`).
|
| 9 |
+
- [x] **Initial Files:**
|
| 10 |
+
- Create `requirements.txt` with initial dependencies (`Flask`, `pytest`).
|
| 11 |
+
- Create a `.gitignore` file for Python.
|
| 12 |
+
- Create a `README.md` with initial setup instructions.
|
| 13 |
+
- Create placeholder files: `deployed.md` and `design-and-evaluation.md`.
|
| 14 |
+
- [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`.
|
| 15 |
+
|
| 16 |
+
## 2. "Hello World" Deployment
|
| 17 |
+
|
| 18 |
+
- [ ] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object.
|
| 19 |
+
- [ ] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload.
|
| 20 |
+
- [ ] **Local Validation:** Run the app and tests locally to confirm everything works.
|
| 21 |
+
|
| 22 |
+
## 3. CI/CD and Initial Deployment
|
| 23 |
+
|
| 24 |
+
- [ ] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository.
|
| 25 |
+
- [ ] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`).
|
| 26 |
+
- [ ] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that:
|
| 27 |
+
- Triggers on push/PR to the `main` branch.
|
| 28 |
+
- Installs dependencies from `requirements.txt`.
|
| 29 |
+
- Runs the `pytest` test suite.
|
| 30 |
+
- On success, triggers a deployment to Render.
|
| 31 |
+
- [ ] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed.
|
| 32 |
+
- [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application.
|
| 33 |
+
|
| 34 |
+
## 4. Data Ingestion and Processing
|
| 35 |
+
|
| 36 |
+
- [ ] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `corpus/` directory.
|
| 37 |
+
- [ ] **Parsing Logic:** Implement and test functions to parse different document formats.
|
| 38 |
+
- [ ] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
|
| 39 |
+
- [ ] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.
|
| 40 |
+
|
| 41 |
+
## 5. Embedding and Vector Storage
|
| 42 |
+
|
| 43 |
+
- [ ] **Vector DB Setup:** Integrate a vector database (e.g., ChromaDB) into the project.
|
| 44 |
+
- [ ] **Embedding Model:** Select and integrate a free embedding model (e.g., from HuggingFace).
|
| 45 |
+
- [ ] **Ingestion Pipeline:** Create a script (`ingest.py`) that:
|
| 46 |
+
- Loads documents from the corpus.
|
| 47 |
+
- Chunks the documents.
|
| 48 |
+
- Embeds the chunks.
|
| 49 |
+
- Stores the embeddings in the vector database.
|
| 50 |
+
- [ ] **Testing:** Write tests to verify each step of the ingestion pipeline.
|
| 51 |
+
|
| 52 |
+
## 6. RAG Core Implementation
|
| 53 |
+
|
| 54 |
+
- [ ] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
|
| 55 |
+
- [ ] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM.
|
| 56 |
+
- [ ] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
|
| 57 |
+
- [ ] **Guardrails:** Implement and test guardrails:
|
| 58 |
+
- Refuse to answer questions outside the corpus.
|
| 59 |
+
- Limit the length of the generated output.
|
| 60 |
+
- Ensure all answers cite the source document IDs/titles.
|
| 61 |
+
|
| 62 |
+
## 7. Web Application Completion
|
| 63 |
+
|
| 64 |
+
- [ ] **Chat Interface:** Implement a simple web chat interface for the `/` endpoint.
|
| 65 |
+
- [ ] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
|
| 66 |
+
- [ ] **UI/UX:** Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
|
| 67 |
+
- [ ] **Testing:** Write end-to-end tests for the chat functionality.
|
| 68 |
+
|
| 69 |
+
## 8. Evaluation
|
| 70 |
+
|
| 71 |
+
- [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
|
| 72 |
+
- [ ] **Metric Implementation:** Develop scripts to calculate:
|
| 73 |
+
- **Answer Quality:** Groundedness and Citation Accuracy.
|
| 74 |
+
- **System Metrics:** Latency (p50/p95).
|
| 75 |
+
- [ ] **Execution:** Run the evaluation and record the results.
|
| 76 |
+
- [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`.
|
| 77 |
+
|
| 78 |
+
## 9. Final Documentation and Submission
|
| 79 |
+
|
| 80 |
+
- [ ] **Design Document:** Complete `design-and-evaluation.md`, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
|
| 81 |
+
- [ ] **README:** Finalize the `README.md` with comprehensive setup, run, and testing instructions.
|
| 82 |
+
- [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
|
| 83 |
+
- [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.
|
project-prompt-and-rubric.md
ADDED
|
@@ -0,0 +1,228 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
AI Engineering Project
|
| 2 |
+
Project Overview
|
| 3 |
+
For this project, you will be designing, building, and evaluating a Retrieval-Augmented
|
| 4 |
+
Generation (RAG) LLM-based application that answers user questions about a corpus of
|
| 5 |
+
company policies & procedures. You will then deploy the application to a free-tier host
|
| 6 |
+
(e.g., Render, Railway) with a basic CI/CD pipeline (e.g., GitHub Actions) that triggers
|
| 7 |
+
deployment on push/PR when the app builds successfully. Finally, you will demonstrate
|
| 8 |
+
the system via a screen-share video showing key features of your deployed application,
|
| 9 |
+
and a quick walkthrough of your design, evaluation and CI/CD run. You can complete this
|
| 10 |
+
project either individually or as a group of no more than three people.
|
| 11 |
+
While you can fully hand code this project if you wish, you are highly encouraged to
|
| 12 |
+
utilize leading AI code generation models/AI IDEs/async agents to assist in rapidly
|
| 13 |
+
producing your solution, being sure to describe in broad terms how you made use of
|
| 14 |
+
them. Here are some examples of very useful AI tools you may wish to consider. You will
|
| 15 |
+
be graded on the quality and functionality of the application and how well it meets the
|
| 16 |
+
project requirements—no given proportion of the code is required to be hand coded.
|
| 17 |
+
|
| 18 |
+
Learning Outcomes
|
| 19 |
+
|
| 20 |
+
When completed successfully, this project will enable you to:
|
| 21 |
+
● Demonstrate excellent AI engineering skills
|
| 22 |
+
● Demonstrate the ability to select appropriate AI application design and
|
| 23 |
+
architecture
|
| 24 |
+
● Implement a working LLM-based application including RAG
|
| 25 |
+
● Evaluate the performance of an LLM-based application
|
| 26 |
+
● Utilize AI tooling as appropriate
|
| 27 |
+
|
| 28 |
+
Project Description
|
| 29 |
+
|
| 30 |
+
First, assemble a small but coherent corpus of documents outlining company policies &
|
| 31 |
+
procedures—about 5–20 short markdown/HTML/PDF/TXT files totaling 30–120 pages.
|
| 32 |
+
You may author them yourself (with AI assistance) or use policies that you are aware of
|
| 33 |
+
from your own organization that can be used for this assignment. Students must use a
|
| 34 |
+
corpus they can legally include in the repo or load at runtime (e.g., your own synthetic
|
| 35 |
+
policies, your organization’s employee policy documents etc.)—no private/paid data is
|
| 36 |
+
required. Additionally, you should define success metrics for your application (see the
|
| 37 |
+
“Evaluation” step below), including at least one information-quality metric (e.g.,
|
| 38 |
+
groundedness or citation accuracy) and one system metric (e.g., latency).
|
| 39 |
+
Use free or zero-cost options when possible e.g., OpenRouter’s free tier
|
| 40 |
+
(https://openrouter.ai/docs/api-reference/limits), Groq
|
| 41 |
+
(https://console.groq.com/docs/rate-limits), or your own paid API keys if you have them.
|
| 42 |
+
For embedding models, free-tier options are available from Cohere, Voyage,
|
| 43 |
+
HuggingFace and others
|
| 44 |
+
Complete the following steps to fully develop, deploy, and evaluate your application:
|
| 45 |
+
|
| 46 |
+
Environment and Reproducibility
|
| 47 |
+
○ Create a virtual environment (e.g., venv, conda).
|
| 48 |
+
○ List dependencies in requirements.txt (or environment.yml).
|
| 49 |
+
○ Provide a README.md with setup + run instructions.
|
| 50 |
+
○ Set fixed seeds where/if applicable (for deterministic chunking or
|
| 51 |
+
evaluation sampling).
|
| 52 |
+
Ingestion and Indexing
|
| 53 |
+
○ Parse & clean documents (handle PDFs/HTML/md/txt).
|
| 54 |
+
○ Chunk documents (e.g., by headings or token windows with overlap).
|
| 55 |
+
○ Embed chunks with a free embedding model or a free-tier API.
|
| 56 |
+
○ Store the embedded document chunks in a local or lightweight vector
|
| 57 |
+
database (e.g. Chroma or an optionally cloud-hosted vector store like
|
| 58 |
+
Pinecone, etc.)
|
| 59 |
+
○ Store vectors in a local/vector DB or cloud DB (e.g., Chroma, Pinecone, etc.)
|
| 60 |
+
Retrieval and Generation (RAG)
|
| 61 |
+
○ To build your RAG pipeline you may use frameworks such as LangChain to
|
| 62 |
+
handle retrieval, prompt chaining, and API calls, or implement these
|
| 63 |
+
manually.
|
| 64 |
+
○ Implement Top-k retrieval with optional re-ranking.
|
| 65 |
+
○ Build a prompting strategy that injects retrieved chunks (and
|
| 66 |
+
citations/sources) into the LLM context.
|
| 67 |
+
○ Add basic guardrails:
|
| 68 |
+
■ Refuse to answer outside the corpus (“I can only answer about our
|
| 69 |
+
policies”),
|
| 70 |
+
■ Limit output length,
|
| 71 |
+
■ Always cite source doc IDs/titles for answers.
|
| 72 |
+
Web Application
|
| 73 |
+
○ Students can use Flask, Streamlit or alernative for the Web app. LangChain
|
| 74 |
+
is recommended for orchestration, but is optional.
|
| 75 |
+
○ Endpoints/UI:
|
| 76 |
+
■ / - Web chat interface (text box for user input)
|
| 77 |
+
■ /chat - API endpoint that receives user questions (POST) and returns
|
| 78 |
+
model-generated answers with citations and snippets (link to source
|
| 79 |
+
and show snippet).
|
| 80 |
+
■ /health - returns simple status via JSON.
|
| 81 |
+
Deployment
|
| 82 |
+
○ For production hosting use Render or Railway free tiers; students may
|
| 83 |
+
alternatively use any other free-tier providers of their choice.
|
| 84 |
+
○ Configure environment variables (e.g. API keys, model endpoints, DB
|
| 85 |
+
related etc.).
|
| 86 |
+
○ Ensure the app is publicly accessible at a shareable URL.
|
| 87 |
+
CI/CD
|
| 88 |
+
○ Minimal automated testing is sufficient for this assignment (a build/run
|
| 89 |
+
check, optional smoke test).
|
| 90 |
+
○ Create a GitHub Actions workflow that on push/PR :
|
| 91 |
+
■ Installs dependencies,
|
| 92 |
+
■ Runs a build/start check (e.g., python -m pip install -r
|
| 93 |
+
requirements.txt and python -c "import app" or pytest -q if you add
|
| 94 |
+
tests),
|
| 95 |
+
■ On success in main, deploy to your host (Render/Railway action or
|
| 96 |
+
via webhook/API).
|
| 97 |
+
Evaluation of the LLM Application
|
| 98 |
+
○ Provide a small evaluation set of 15–30 questions covering various policy
|
| 99 |
+
topics (PTO, security, expense, remote work, holidays, etc.). Report:
|
| 100 |
+
■ Answer Quality (required):
|
| 101 |
+
1. Groundedness: % of answers whose content is factually
|
| 102 |
+
consistent with and fully supported by the retrieved
|
| 103 |
+
evidence—i.e., the answer contains no information that is
|
| 104 |
+
absent or contradicted in the context.
|
| 105 |
+
Citation Accuracy: % of answers whose listed citations
|
| 106 |
+
correctly point to the specific passage(s) that support the
|
| 107 |
+
information stated—i.e., the attribution is correct and not
|
| 108 |
+
misleading.
|
| 109 |
+
Exact/Partial Match (optional): % of answers that exactly or
|
| 110 |
+
partially match a short gold answer you provide.
|
| 111 |
+
■ System Metrics (required):
|
| 112 |
+
Latency (p50/p95) from request to answer for 10–20 queries.
|
| 113 |
+
■ Ablations (optional): compare retrieval k, chunk size, or prompt
|
| 114 |
+
variants.
|
| 115 |
+
Design Documentation
|
| 116 |
+
○ Briefly justify design choices (embedding model, chunking, k, prompt
|
| 117 |
+
format, vector store).
|
| 118 |
+
Submission Guidelines
|
| 119 |
+
|
| 120 |
+
Your final submission should consist of two links:
|
| 121 |
+
● A link to an accessible software repository (a GitHub repo) containing all your
|
| 122 |
+
developed code. You must share your repository with the GitHub account,
|
| 123 |
+
quantic-grader.
|
| 124 |
+
o The GitHub repository should include a link to the deployed version of
|
| 125 |
+
your RAG LLM-based application (in file deployed.md)
|
| 126 |
+
o The GitHub repository must include a README.md file indicating setup and
|
| 127 |
+
run instructions
|
| 128 |
+
o The GitHub repository must also include a brief design and evaluation
|
| 129 |
+
document (design-and-evaluation.md) listing and explaining:
|
| 130 |
+
i) design and architecture decisions made - and why they were made,
|
| 131 |
+
including technology choices
|
| 132 |
+
ii) summary of your evaluation of your RAG system
|
| 133 |
+
● A link to a recorded screen-share demonstration video of the working RAG
|
| 134 |
+
LLM-based application, involving screen capture of it being used with voiceover
|
| 135 |
+
o All group members must speak and be present on camera.
|
| 136 |
+
o All group members must show their government ID.
|
| 137 |
+
o The demonstration/presentation should be between 5 and 10 minutes long.
|
| 138 |
+
To submit your project, please click on the "Submit Project" button on your dashboard
|
| 139 |
+
and follow the steps provided. If you are submitting your project as a group, please
|
| 140 |
+
ensure only ONE member submits on behalf of the group. Please reach out to
|
| 141 |
+
[email protected] if you have any questions. Project grading typically takes
|
| 142 |
+
|
| 143 |
+
about 3-4 weeks to complete after the submission due date. There is no score penalty
|
| 144 |
+
for projects submitted after the due date, however grading may be delayed.
|
| 145 |
+
|
| 146 |
+
Plagiarism Policy
|
| 147 |
+
|
| 148 |
+
Here at Quantic, we believe that learning is best accomplished by “doing”—this ethos
|
| 149 |
+
underpinned the design of our active learning platform, and it likewise informs our
|
| 150 |
+
approach to the completion of projects and presentations for our degree programs. We
|
| 151 |
+
expect that all of our graduates will be able to deploy the concepts and skills they’ve
|
| 152 |
+
learned over the course of their degree, whether in the workplace or in pursuit of
|
| 153 |
+
personal goals, and so it is in our students’ best interest that these assignments be
|
| 154 |
+
completed solely through their own efforts with academic integrity.
|
| 155 |
+
Quantic takes academic integrity very seriously—we define plagiarism as: “Knowingly
|
| 156 |
+
representing the work of others as one’s own, engaging in any acts of plagiarism, or
|
| 157 |
+
referencing the works of others without appropriate citation.” This includes both misusing
|
| 158 |
+
or not using proper citations for the works referenced, and submitting someone else’s
|
| 159 |
+
work as your own. Quantic monitors all submissions for instances of plagiarism and all
|
| 160 |
+
plagiarism, even unintentional, is considered a conduct violation. If you’re still not sure
|
| 161 |
+
about what constitutes plagiarism, check out this two-minute presentation by our
|
| 162 |
+
librarian, Kristina. It is important to be conscientious when citing your sources. When in
|
| 163 |
+
doubt, cite! Kristina outlines the basics of best citation practices in this one-minute video.
|
| 164 |
+
You can also find more about our plagiarism policy here.
|
| 165 |
+
|
| 166 |
+
Project Rubric
|
| 167 |
+
Scores 2 and above are considered passing. Students who receive a 1 or 0 will not get
|
| 168 |
+
credit for the assignment and must revise and resubmit to receive a passing grade.
|
| 169 |
+
Score Description
|
| 170 |
+
|
| 171 |
+
5
|
| 172 |
+
● Addresses ALL of the project requirements, but not limited to:
|
| 173 |
+
○ Outstanding RAG application with correct responses with matching
|
| 174 |
+
citations, ingest and indexing works
|
| 175 |
+
○ Excellent, well-structured application architecture
|
| 176 |
+
○ Public deployment on Render, Railway (or equivalent) fully functional
|
| 177 |
+
○ CI/CD runs on push/PR and deploys on success
|
| 178 |
+
○ Excellent documentation of design choices.
|
| 179 |
+
○ Excellent evaluation results, which includes groundedness, citation
|
| 180 |
+
accuracy, and latency
|
| 181 |
+
○ Excellent, clear demo of features, design and evaluation
|
| 182 |
+
4
|
| 183 |
+
● Addresses MOST of the project requirements, but not limited to:
|
| 184 |
+
○ Excellent RAG application with correct responses with generally
|
| 185 |
+
matching citations, ingest and indexing works
|
| 186 |
+
○ Very good, well-structured application architecture
|
| 187 |
+
○ Public deployment on Render, Railway (or equivalent) almost fully
|
| 188 |
+
functional
|
| 189 |
+
○ CI/CD runs on push/PR and deploys on success
|
| 190 |
+
○ Very good documentation of design choices.
|
| 191 |
+
○ Very good evaluation results which includes groundedness, citation
|
| 192 |
+
accuracy, and latency
|
| 193 |
+
○ Very good, clear demo of features, design and evaluation
|
| 194 |
+
3
|
| 195 |
+
● Addresses SOME of the project requirements, but not limited to:
|
| 196 |
+
○ Very good RAG application with mainly correct responses with
|
| 197 |
+
generally matching citations, ingest and indexing works
|
| 198 |
+
○ Good, well-structured application architecture
|
| 199 |
+
○ Public deployment on Render, Railway (or equivalent) almost fully
|
| 200 |
+
functional
|
| 201 |
+
○ CI/CD runs on push/PR and deploys on success
|
| 202 |
+
○ Good documentation of design choices.
|
| 203 |
+
○ Good evaluation results which includes most of groundedness,
|
| 204 |
+
citation accuracy, and latency
|
| 205 |
+
○ Good, clear demo of features, design and evaluation.
|
| 206 |
+
2
|
| 207 |
+
● Addresses FEW of the project requirements, but not limited to:
|
| 208 |
+
○ Passable RAG application with limited correct responses with few
|
| 209 |
+
matching citations, ingest and indexing works partially
|
| 210 |
+
○ Passable application architecture
|
| 211 |
+
○ Public deployment on Render, Railway (or equivalent) not fully
|
| 212 |
+
functional
|
| 213 |
+
○ CI/CD runs on push/PR and deploys on success
|
| 214 |
+
○ Passable documentation of design choices.
|
| 215 |
+
○ Passable evaluation results which includes only some of
|
| 216 |
+
groundedness, citation accuracy, and latency
|
| 217 |
+
○ Passable demo of features, design and evaluation
|
| 218 |
+
1
|
| 219 |
+
● Addresses the project but MOST of the project requirements are missing,
|
| 220 |
+
but not limited to:
|
| 221 |
+
○ Incomplete app; not deployed,
|
| 222 |
+
○ No CI/CD,
|
| 223 |
+
○ No to very limited evaluation
|
| 224 |
+
○ No design documentation
|
| 225 |
+
○ No demo of application
|
| 226 |
+
0
|
| 227 |
+
● The student either did not complete the assignment, plagiarized all or part
|
| 228 |
+
of the assignment, or completely failed to address the project requirements.
|
requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Flask
|
| 2 |
+
pytest
|