Spaces:

sethmcknight
/

msse-ai-engineering

Sleeping

App Files Files Community

sethmcknight commited on Oct 10

Commit

2d9ce15

1 Parent(s): 92c00a3

Add initial project files including README, .gitignore, and project documentation

Browse files

Files changed (8) hide show

.gitignore +11 -0
README.md +41 -1
copilot-instructions.md +60 -0
deployed.md +3 -0
design-and-evaluation.md +3 -0
project-plan.md +83 -0
project-prompt-and-rubric.md +228 -0
requirements.txt +2 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/

README.md CHANGED Viewed

@@ -1,2 +1,42 @@
-# msse-ai-engineering
 Repo for the Quantic MSSE AI Engineering project code

+# # MSSE AI Engineering Project
+This project is a Retrieval-Augmented Generation (RAG) application that answers questions about a corpus of company policies.
+## Setup
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/sethmcknight/msse-ai-engineering.git
+   cd msse-ai-engineering
+   ```
+2. Create and activate a virtual environment:
+   ```bash
+   python3 -m venv venv
+   source venv/bin/activate
+   ```
+3. Install the dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Running the Application
+To run the Flask application:
+```bash
+flask run
+```
+## Running Tests
+To run the test suite:
+```bash
+pytest
+```
 Repo for the Quantic MSSE AI Engineering project code

copilot-instructions.md ADDED Viewed

	@@ -0,0 +1,60 @@

+# Copilot Instructions
+This document outlines the guiding principles and directives for the GitHub Copilot assistant for the duration of this project. The primary objective is to successfully build, evaluate, and deploy a Retrieval-Augmented Generation (RAG) application in accordance with the `project-prompt-and-rubric.md` and the `project-plan.md`.
+## Core Mission
+Your primary goal is to assist in developing a RAG application that meets all requirements for a grade of 5. You must adhere to the development plan, follow best practices, and proactively contribute to the project's success.
+## Guiding Principles
+1.  **Plan-Driven Development:** Always refer to `project-plan.md` as the source of truth for the current task and overall workflow. Do not deviate from the plan without explicit instruction.
+2.  **Test-Driven Development (TDD):** This is a strict requirement. For every new feature or piece of logic, you must first write the failing tests using `pytest` and then implement the code to make the tests pass.
+3.  **Continuous Integration/Continuous Deployment (CI/CD):** The project prioritizes early and continuous deployment. All changes must pass the CI/CD pipeline (install, test, build) before being merged into the `main` branch.
+4.  **Rubric-Focused:** All development choices should be justifiable against the `project-prompt-and-rubric.md`. This includes technology choices, implementation details, and evaluation metrics.
+5.  **Reproducibility:** Ensure the application is reproducible by managing dependencies in `requirements.txt` and setting fixed seeds where applicable (e.g., chunking, evaluation).
+## Technical Stack & Constraints
+- **Language:** Python
+- **Web Framework:** Flask
+- **Testing:** `pytest`
+- **Vector Database:** ChromaDB (local)
+- **Embedding & LLM APIs:** Use free-tier services (e.g., OpenRouter, Groq, HuggingFace).
+- **Deployment:** Render
+- **CI/CD:** GitHub Actions
+## Step-by-Step Workflow
+You must follow the sequence laid out in `project-plan.md`. The key phases are:
+1.  **Project Setup:** Initialize the repository, virtual environment, and placeholder files.
+2.  **"Hello World" Deployment:** Create a minimal Flask app with a `/health` endpoint and deploy it to Render via the initial CI/CD pipeline. This is a critical first milestone.
+3.  **TDD Cycles:** For all subsequent features (data ingestion, embedding, RAG, web UI):
+    - Write tests.
+    - Implement the feature.
+    - Run tests locally.
+    - Commit and push to trigger the CI/CD pipeline.
+    - Verify deployment.
+## Key Application Requirements
+- **Endpoints:**
+  - `/`: Web chat interface.
+  - `/chat`: API for questions (POST) and answers (JSON with citations).
+  - `/health`: Simple JSON status.
+- **Guardrails (Must be tested):**
+  - Refuse to answer questions outside the provided corpus.
+  - Limit output length.
+  - Always cite sources for every answer.
+- **Documentation:**
+  - Keep `README.md` updated with setup and run instructions.
+  - Incrementally populate `design-and-evaluation.md` as decisions are made and results are gathered.
+  - Ensure `deployed.md` always contains the correct public URL.
+## Your Role
+- **Implementer:** Write code, create files, and configure services based on my requests.
+- **Tester:** Write `pytest` tests for all functionality.
+- **Reviewer:** Proactively identify potential issues, suggest improvements, and ensure code quality.
+- **Navigator:** Keep track of the current step in the `project-plan.md` and be ready to proceed to the next one.

deployed.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ # Deployed Application
2	+
3	+ The application is not yet deployed.

design-and-evaluation.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ # Design and Evaluation
2	+
3	+ This document will be updated with design choices and evaluation results as the project progresses.

project-plan.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# RAG Application Project Plan
+This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.
+## 1. Foundational Setup
+- [x] **Repository:** Create a new GitHub repository.
+- [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`).
+- [x] **Initial Files:**
+  - Create `requirements.txt` with initial dependencies (`Flask`, `pytest`).
+  - Create a `.gitignore` file for Python.
+  - Create a `README.md` with initial setup instructions.
+  - Create placeholder files: `deployed.md` and `design-and-evaluation.md`.
+- [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`.
+## 2. "Hello World" Deployment
+- [ ] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object.
+- [ ] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload.
+- [ ] **Local Validation:** Run the app and tests locally to confirm everything works.
+## 3. CI/CD and Initial Deployment
+- [ ] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository.
+- [ ] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`).
+- [ ] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that:
+  - Triggers on push/PR to the `main` branch.
+  - Installs dependencies from `requirements.txt`.
+  - Runs the `pytest` test suite.
+  - On success, triggers a deployment to Render.
+- [ ] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed.
+- [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application.
+## 4. Data Ingestion and Processing
+- [ ] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `corpus/` directory.
+- [ ] **Parsing Logic:** Implement and test functions to parse different document formats.
+- [ ] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
+- [ ] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.
+## 5. Embedding and Vector Storage
+- [ ] **Vector DB Setup:** Integrate a vector database (e.g., ChromaDB) into the project.
+- [ ] **Embedding Model:** Select and integrate a free embedding model (e.g., from HuggingFace).
+- [ ] **Ingestion Pipeline:** Create a script (`ingest.py`) that:
+  - Loads documents from the corpus.
+  - Chunks the documents.
+  - Embeds the chunks.
+  - Stores the embeddings in the vector database.
+- [ ] **Testing:** Write tests to verify each step of the ingestion pipeline.
+## 6. RAG Core Implementation
+- [ ] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
+- [ ] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM.
+- [ ] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
+- [ ] **Guardrails:** Implement and test guardrails:
+  - Refuse to answer questions outside the corpus.
+  - Limit the length of the generated output.
+  - Ensure all answers cite the source document IDs/titles.
+## 7. Web Application Completion
+- [ ] **Chat Interface:** Implement a simple web chat interface for the `/` endpoint.
+- [ ] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
+- [ ] **UI/UX:** Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
+- [ ] **Testing:** Write end-to-end tests for the chat functionality.
+## 8. Evaluation
+- [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
+- [ ] **Metric Implementation:** Develop scripts to calculate:
+  - **Answer Quality:** Groundedness and Citation Accuracy.
+  - **System Metrics:** Latency (p50/p95).
+- [ ] **Execution:** Run the evaluation and record the results.
+- [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`.
+## 9. Final Documentation and Submission
+- [ ] **Design Document:** Complete `design-and-evaluation.md`, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
+- [ ] **README:** Finalize the `README.md` with comprehensive setup, run, and testing instructions.
+- [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
+- [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.

project-prompt-and-rubric.md ADDED Viewed

	@@ -0,0 +1,228 @@

+AI Engineering Project
+Project Overview
+For this project, you will be designing, building, and evaluating a Retrieval-Augmented
+Generation (RAG) LLM-based application that answers user questions about a corpus of
+company policies & procedures. You will then deploy the application to a free-tier host
+(e.g., Render, Railway) with a basic CI/CD pipeline (e.g., GitHub Actions) that triggers
+deployment on push/PR when the app builds successfully. Finally, you will demonstrate
+the system via a screen-share video showing key features of your deployed application,
+and a quick walkthrough of your design, evaluation and CI/CD run. You can complete this
+project either individually or as a group of no more than three people.
+While you can fully hand code this project if you wish, you are highly encouraged to
+utilize leading AI code generation models/AI IDEs/async agents to assist in rapidly
+producing your solution, being sure to describe in broad terms how you made use of
+them. Here are some examples of very useful AI tools you may wish to consider. You will
+be graded on the quality and functionality of the application and how well it meets the
+project requirements—no given proportion of the code is required to be hand coded.
+Learning Outcomes
+When completed successfully, this project will enable you to:
+● Demonstrate excellent AI engineering skills
+● Demonstrate the ability to select appropriate AI application design and
+architecture
+● Implement a working LLM-based application including RAG
+● Evaluate the performance of an LLM-based application
+● Utilize AI tooling as appropriate
+Project Description
+First, assemble a small but coherent corpus of documents outlining company policies &
+procedures—about 5–20 short markdown/HTML/PDF/TXT files totaling 30–120 pages.
+You may author them yourself (with AI assistance) or use policies that you are aware of
+from your own organization that can be used for this assignment. Students must use a
+corpus they can legally include in the repo or load at runtime (e.g., your own synthetic
+policies, your organization’s employee policy documents etc.)—no private/paid data is
+required. Additionally, you should define success metrics for your application (see the
+“Evaluation” step below), including at least one information-quality metric (e.g.,
+groundedness or citation accuracy) and one system metric (e.g., latency).
+Use free or zero-cost options when possible e.g., OpenRouter’s free tier
+(https://openrouter.ai/docs/api-reference/limits), Groq
+(https://console.groq.com/docs/rate-limits), or your own paid API keys if you have them.
+For embedding models, free-tier options are available from Cohere, Voyage,
+HuggingFace and others
+Complete the following steps to fully develop, deploy, and evaluate your application:
+Environment and Reproducibility
+○ Create a virtual environment (e.g., venv, conda).
+○ List dependencies in requirements.txt (or environment.yml).
+○ Provide a README.md with setup + run instructions.
+○ Set fixed seeds where/if applicable (for deterministic chunking or
+evaluation sampling).
+Ingestion and Indexing
+○ Parse & clean documents (handle PDFs/HTML/md/txt).
+○ Chunk documents (e.g., by headings or token windows with overlap).
+○ Embed chunks with a free embedding model or a free-tier API.
+○ Store the embedded document chunks in a local or lightweight vector
+database (e.g. Chroma or an optionally cloud-hosted vector store like
+Pinecone, etc.)
+○ Store vectors in a local/vector DB or cloud DB (e.g., Chroma, Pinecone, etc.)
+Retrieval and Generation (RAG)
+○ To build your RAG pipeline you may use frameworks such as LangChain to
+handle retrieval, prompt chaining, and API calls, or implement these
+manually.
+○ Implement Top-k retrieval with optional re-ranking.
+○ Build a prompting strategy that injects retrieved chunks (and
+citations/sources) into the LLM context.
+○ Add basic guardrails:
+■ Refuse to answer outside the corpus (“I can only answer about our
+policies”),
+■ Limit output length,
+■ Always cite source doc IDs/titles for answers.
+Web Application
+○ Students can use Flask, Streamlit or alernative for the Web app. LangChain
+is recommended for orchestration, but is optional.
+○ Endpoints/UI:
+■ / - Web chat interface (text box for user input)
+■ /chat - API endpoint that receives user questions (POST) and returns
+model-generated answers with citations and snippets (link to source
+and show snippet).
+■ /health - returns simple status via JSON.
+Deployment
+○ For production hosting use Render or Railway free tiers; students may
+alternatively use any other free-tier providers of their choice.
+○ Configure environment variables (e.g. API keys, model endpoints, DB
+related etc.).
+○ Ensure the app is publicly accessible at a shareable URL.
+CI/CD
+○ Minimal automated testing is sufficient for this assignment (a build/run
+check, optional smoke test).
+○ Create a GitHub Actions workflow that on push/PR :
+■ Installs dependencies,
+■ Runs a build/start check (e.g., python -m pip install -r
+requirements.txt and python -c "import app" or pytest -q if you add
+tests),
+■ On success in main, deploy to your host (Render/Railway action or
+via webhook/API).
+Evaluation of the LLM Application
+○ Provide a small evaluation set of 15–30 questions covering various policy
+topics (PTO, security, expense, remote work, holidays, etc.). Report:
+■ Answer Quality (required):
+1. Groundedness: % of answers whose content is factually
+consistent with and fully supported by the retrieved
+evidence—i.e., the answer contains no information that is
+absent or contradicted in the context.
+Citation Accuracy: % of answers whose listed citations
+correctly point to the specific passage(s) that support the
+information stated—i.e., the attribution is correct and not
+misleading.
+Exact/Partial Match (optional): % of answers that exactly or
+partially match a short gold answer you provide.
+■ System Metrics (required):
+Latency (p50/p95) from request to answer for 10–20 queries.
+■ Ablations (optional): compare retrieval k, chunk size, or prompt
+variants.
+Design Documentation
+○ Briefly justify design choices (embedding model, chunking, k, prompt
+format, vector store).
+Submission Guidelines
+Your final submission should consist of two links:
+● A link to an accessible software repository (a GitHub repo) containing all your
+developed code. You must share your repository with the GitHub account,
+quantic-grader.
+o The GitHub repository should include a link to the deployed version of
+your RAG LLM-based application (in file deployed.md)
+o The GitHub repository must include a README.md file indicating setup and
+run instructions
+o The GitHub repository must also include a brief design and evaluation
+document (design-and-evaluation.md) listing and explaining:
+i) design and architecture decisions made - and why they were made,
+including technology choices
+ii) summary of your evaluation of your RAG system
+● A link to a recorded screen-share demonstration video of the working RAG
+LLM-based application, involving screen capture of it being used with voiceover
+o All group members must speak and be present on camera.
+o All group members must show their government ID.
+o The demonstration/presentation should be between 5 and 10 minutes long.
+To submit your project, please click on the "Submit Project" button on your dashboard
+and follow the steps provided. If you are submitting your project as a group, please
+ensure only ONE member submits on behalf of the group. Please reach out to
+[email protected] if you have any questions. Project grading typically takes
+about 3-4 weeks to complete after the submission due date. There is no score penalty
+for projects submitted after the due date, however grading may be delayed.
+Plagiarism Policy
+Here at Quantic, we believe that learning is best accomplished by “doing”—this ethos
+underpinned the design of our active learning platform, and it likewise informs our
+approach to the completion of projects and presentations for our degree programs. We
+expect that all of our graduates will be able to deploy the concepts and skills they’ve
+learned over the course of their degree, whether in the workplace or in pursuit of
+personal goals, and so it is in our students’ best interest that these assignments be
+completed solely through their own efforts with academic integrity.
+Quantic takes academic integrity very seriously—we define plagiarism as: “Knowingly
+representing the work of others as one’s own, engaging in any acts of plagiarism, or
+referencing the works of others without appropriate citation.” This includes both misusing
+or not using proper citations for the works referenced, and submitting someone else’s
+work as your own. Quantic monitors all submissions for instances of plagiarism and all
+plagiarism, even unintentional, is considered a conduct violation. If you’re still not sure
+about what constitutes plagiarism, check out this two-minute presentation by our
+librarian, Kristina. It is important to be conscientious when citing your sources. When in
+doubt, cite! Kristina outlines the basics of best citation practices in this one-minute video.
+You can also find more about our plagiarism policy here.
+Project Rubric
+Scores 2 and above are considered passing. Students who receive a 1 or 0 will not get
+credit for the assignment and must revise and resubmit to receive a passing grade.
+Score Description
+5
+● Addresses ALL of the project requirements, but not limited to:
+○ Outstanding RAG application with correct responses with matching
+citations, ingest and indexing works
+○ Excellent, well-structured application architecture
+○ Public deployment on Render, Railway (or equivalent) fully functional
+○ CI/CD runs on push/PR and deploys on success
+○ Excellent documentation of design choices.
+○ Excellent evaluation results, which includes groundedness, citation
+accuracy, and latency
+○ Excellent, clear demo of features, design and evaluation
+4
+● Addresses MOST of the project requirements, but not limited to:
+○ Excellent RAG application with correct responses with generally
+matching citations, ingest and indexing works
+○ Very good, well-structured application architecture
+○ Public deployment on Render, Railway (or equivalent) almost fully
+functional
+○ CI/CD runs on push/PR and deploys on success
+○ Very good documentation of design choices.
+○ Very good evaluation results which includes groundedness, citation
+accuracy, and latency
+○ Very good, clear demo of features, design and evaluation
+3
+● Addresses SOME of the project requirements, but not limited to:
+○ Very good RAG application with mainly correct responses with
+generally matching citations, ingest and indexing works
+○ Good, well-structured application architecture
+○ Public deployment on Render, Railway (or equivalent) almost fully
+functional
+○ CI/CD runs on push/PR and deploys on success
+○ Good documentation of design choices.
+○ Good evaluation results which includes most of groundedness,
+citation accuracy, and latency
+○ Good, clear demo of features, design and evaluation.
+2
+● Addresses FEW of the project requirements, but not limited to:
+○ Passable RAG application with limited correct responses with few
+matching citations, ingest and indexing works partially
+○ Passable application architecture
+○ Public deployment on Render, Railway (or equivalent) not fully
+functional
+○ CI/CD runs on push/PR and deploys on success
+○ Passable documentation of design choices.
+○ Passable evaluation results which includes only some of
+groundedness, citation accuracy, and latency
+○ Passable demo of features, design and evaluation
+1
+● Addresses the project but MOST of the project requirements are missing,
+but not limited to:
+○ Incomplete app; not deployed,
+○ No CI/CD,
+○ No to very limited evaluation
+○ No design documentation
+○ No demo of application
+0
+● The student either did not complete the assignment, plagiarized all or part
+of the assignment, or completely failed to address the project requirements.

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Flask
2	+ pytest