sethmcknight commited on
Commit
2d9ce15
·
1 Parent(s): 92c00a3

Add initial project files including README, .gitignore, and project documentation

Browse files
.gitignore ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ .Python
7
+ env/
8
+ venv/
9
+ ENV/
10
+ env.bak/
11
+ venv.bak/
README.md CHANGED
@@ -1,2 +1,42 @@
1
- # msse-ai-engineering
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  Repo for the Quantic MSSE AI Engineering project code
 
1
+ # # MSSE AI Engineering Project
2
+
3
+ This project is a Retrieval-Augmented Generation (RAG) application that answers questions about a corpus of company policies.
4
+
5
+ ## Setup
6
+
7
+ 1. Clone the repository:
8
+
9
+ ```bash
10
+ git clone https://github.com/sethmcknight/msse-ai-engineering.git
11
+ cd msse-ai-engineering
12
+ ```
13
+
14
+ 2. Create and activate a virtual environment:
15
+
16
+ ```bash
17
+ python3 -m venv venv
18
+ source venv/bin/activate
19
+ ```
20
+
21
+ 3. Install the dependencies:
22
+ ```bash
23
+ pip install -r requirements.txt
24
+ ```
25
+
26
+ ## Running the Application
27
+
28
+ To run the Flask application:
29
+
30
+ ```bash
31
+ flask run
32
+ ```
33
+
34
+ ## Running Tests
35
+
36
+ To run the test suite:
37
+
38
+ ```bash
39
+ pytest
40
+ ```
41
+
42
  Repo for the Quantic MSSE AI Engineering project code
copilot-instructions.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copilot Instructions
2
+
3
+ This document outlines the guiding principles and directives for the GitHub Copilot assistant for the duration of this project. The primary objective is to successfully build, evaluate, and deploy a Retrieval-Augmented Generation (RAG) application in accordance with the `project-prompt-and-rubric.md` and the `project-plan.md`.
4
+
5
+ ## Core Mission
6
+
7
+ Your primary goal is to assist in developing a RAG application that meets all requirements for a grade of 5. You must adhere to the development plan, follow best practices, and proactively contribute to the project's success.
8
+
9
+ ## Guiding Principles
10
+
11
+ 1. **Plan-Driven Development:** Always refer to `project-plan.md` as the source of truth for the current task and overall workflow. Do not deviate from the plan without explicit instruction.
12
+ 2. **Test-Driven Development (TDD):** This is a strict requirement. For every new feature or piece of logic, you must first write the failing tests using `pytest` and then implement the code to make the tests pass.
13
+ 3. **Continuous Integration/Continuous Deployment (CI/CD):** The project prioritizes early and continuous deployment. All changes must pass the CI/CD pipeline (install, test, build) before being merged into the `main` branch.
14
+ 4. **Rubric-Focused:** All development choices should be justifiable against the `project-prompt-and-rubric.md`. This includes technology choices, implementation details, and evaluation metrics.
15
+ 5. **Reproducibility:** Ensure the application is reproducible by managing dependencies in `requirements.txt` and setting fixed seeds where applicable (e.g., chunking, evaluation).
16
+
17
+ ## Technical Stack & Constraints
18
+
19
+ - **Language:** Python
20
+ - **Web Framework:** Flask
21
+ - **Testing:** `pytest`
22
+ - **Vector Database:** ChromaDB (local)
23
+ - **Embedding & LLM APIs:** Use free-tier services (e.g., OpenRouter, Groq, HuggingFace).
24
+ - **Deployment:** Render
25
+ - **CI/CD:** GitHub Actions
26
+
27
+ ## Step-by-Step Workflow
28
+
29
+ You must follow the sequence laid out in `project-plan.md`. The key phases are:
30
+
31
+ 1. **Project Setup:** Initialize the repository, virtual environment, and placeholder files.
32
+ 2. **"Hello World" Deployment:** Create a minimal Flask app with a `/health` endpoint and deploy it to Render via the initial CI/CD pipeline. This is a critical first milestone.
33
+ 3. **TDD Cycles:** For all subsequent features (data ingestion, embedding, RAG, web UI):
34
+ - Write tests.
35
+ - Implement the feature.
36
+ - Run tests locally.
37
+ - Commit and push to trigger the CI/CD pipeline.
38
+ - Verify deployment.
39
+
40
+ ## Key Application Requirements
41
+
42
+ - **Endpoints:**
43
+ - `/`: Web chat interface.
44
+ - `/chat`: API for questions (POST) and answers (JSON with citations).
45
+ - `/health`: Simple JSON status.
46
+ - **Guardrails (Must be tested):**
47
+ - Refuse to answer questions outside the provided corpus.
48
+ - Limit output length.
49
+ - Always cite sources for every answer.
50
+ - **Documentation:**
51
+ - Keep `README.md` updated with setup and run instructions.
52
+ - Incrementally populate `design-and-evaluation.md` as decisions are made and results are gathered.
53
+ - Ensure `deployed.md` always contains the correct public URL.
54
+
55
+ ## Your Role
56
+
57
+ - **Implementer:** Write code, create files, and configure services based on my requests.
58
+ - **Tester:** Write `pytest` tests for all functionality.
59
+ - **Reviewer:** Proactively identify potential issues, suggest improvements, and ensure code quality.
60
+ - **Navigator:** Keep track of the current step in the `project-plan.md` and be ready to proceed to the next one.
deployed.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Deployed Application
2
+
3
+ The application is not yet deployed.
design-and-evaluation.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Design and Evaluation
2
+
3
+ This document will be updated with design choices and evaluation results as the project progresses.
project-plan.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG Application Project Plan
2
+
3
+ This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.
4
+
5
+ ## 1. Foundational Setup
6
+
7
+ - [x] **Repository:** Create a new GitHub repository.
8
+ - [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`).
9
+ - [x] **Initial Files:**
10
+ - Create `requirements.txt` with initial dependencies (`Flask`, `pytest`).
11
+ - Create a `.gitignore` file for Python.
12
+ - Create a `README.md` with initial setup instructions.
13
+ - Create placeholder files: `deployed.md` and `design-and-evaluation.md`.
14
+ - [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`.
15
+
16
+ ## 2. "Hello World" Deployment
17
+
18
+ - [ ] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object.
19
+ - [ ] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload.
20
+ - [ ] **Local Validation:** Run the app and tests locally to confirm everything works.
21
+
22
+ ## 3. CI/CD and Initial Deployment
23
+
24
+ - [ ] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository.
25
+ - [ ] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`).
26
+ - [ ] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that:
27
+ - Triggers on push/PR to the `main` branch.
28
+ - Installs dependencies from `requirements.txt`.
29
+ - Runs the `pytest` test suite.
30
+ - On success, triggers a deployment to Render.
31
+ - [ ] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed.
32
+ - [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application.
33
+
34
+ ## 4. Data Ingestion and Processing
35
+
36
+ - [ ] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `corpus/` directory.
37
+ - [ ] **Parsing Logic:** Implement and test functions to parse different document formats.
38
+ - [ ] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
39
+ - [ ] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.
40
+
41
+ ## 5. Embedding and Vector Storage
42
+
43
+ - [ ] **Vector DB Setup:** Integrate a vector database (e.g., ChromaDB) into the project.
44
+ - [ ] **Embedding Model:** Select and integrate a free embedding model (e.g., from HuggingFace).
45
+ - [ ] **Ingestion Pipeline:** Create a script (`ingest.py`) that:
46
+ - Loads documents from the corpus.
47
+ - Chunks the documents.
48
+ - Embeds the chunks.
49
+ - Stores the embeddings in the vector database.
50
+ - [ ] **Testing:** Write tests to verify each step of the ingestion pipeline.
51
+
52
+ ## 6. RAG Core Implementation
53
+
54
+ - [ ] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
55
+ - [ ] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM.
56
+ - [ ] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
57
+ - [ ] **Guardrails:** Implement and test guardrails:
58
+ - Refuse to answer questions outside the corpus.
59
+ - Limit the length of the generated output.
60
+ - Ensure all answers cite the source document IDs/titles.
61
+
62
+ ## 7. Web Application Completion
63
+
64
+ - [ ] **Chat Interface:** Implement a simple web chat interface for the `/` endpoint.
65
+ - [ ] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
66
+ - [ ] **UI/UX:** Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
67
+ - [ ] **Testing:** Write end-to-end tests for the chat functionality.
68
+
69
+ ## 8. Evaluation
70
+
71
+ - [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
72
+ - [ ] **Metric Implementation:** Develop scripts to calculate:
73
+ - **Answer Quality:** Groundedness and Citation Accuracy.
74
+ - **System Metrics:** Latency (p50/p95).
75
+ - [ ] **Execution:** Run the evaluation and record the results.
76
+ - [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`.
77
+
78
+ ## 9. Final Documentation and Submission
79
+
80
+ - [ ] **Design Document:** Complete `design-and-evaluation.md`, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
81
+ - [ ] **README:** Finalize the `README.md` with comprehensive setup, run, and testing instructions.
82
+ - [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
83
+ - [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.
project-prompt-and-rubric.md ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AI Engineering Project
2
+ Project Overview
3
+ For this project, you will be designing, building, and evaluating a Retrieval-Augmented
4
+ Generation (RAG) LLM-based application that answers user questions about a corpus of
5
+ company policies & procedures. You will then deploy the application to a free-tier host
6
+ (e.g., Render, Railway) with a basic CI/CD pipeline (e.g., GitHub Actions) that triggers
7
+ deployment on push/PR when the app builds successfully. Finally, you will demonstrate
8
+ the system via a screen-share video showing key features of your deployed application,
9
+ and a quick walkthrough of your design, evaluation and CI/CD run. You can complete this
10
+ project either individually or as a group of no more than three people.
11
+ While you can fully hand code this project if you wish, you are highly encouraged to
12
+ utilize leading AI code generation models/AI IDEs/async agents to assist in rapidly
13
+ producing your solution, being sure to describe in broad terms how you made use of
14
+ them. Here are some examples of very useful AI tools you may wish to consider. You will
15
+ be graded on the quality and functionality of the application and how well it meets the
16
+ project requirements—no given proportion of the code is required to be hand coded.
17
+
18
+ Learning Outcomes
19
+
20
+ When completed successfully, this project will enable you to:
21
+ ● Demonstrate excellent AI engineering skills
22
+ ● Demonstrate the ability to select appropriate AI application design and
23
+ architecture
24
+ ● Implement a working LLM-based application including RAG
25
+ ● Evaluate the performance of an LLM-based application
26
+ ● Utilize AI tooling as appropriate
27
+
28
+ Project Description
29
+
30
+ First, assemble a small but coherent corpus of documents outlining company policies &
31
+ procedures—about 5–20 short markdown/HTML/PDF/TXT files totaling 30–120 pages.
32
+ You may author them yourself (with AI assistance) or use policies that you are aware of
33
+ from your own organization that can be used for this assignment. Students must use a
34
+ corpus they can legally include in the repo or load at runtime (e.g., your own synthetic
35
+ policies, your organization’s employee policy documents etc.)—no private/paid data is
36
+ required. Additionally, you should define success metrics for your application (see the
37
+ “Evaluation” step below), including at least one information-quality metric (e.g.,
38
+ groundedness or citation accuracy) and one system metric (e.g., latency).
39
+ Use free or zero-cost options when possible e.g., OpenRouter’s free tier
40
+ (https://openrouter.ai/docs/api-reference/limits), Groq
41
+ (https://console.groq.com/docs/rate-limits), or your own paid API keys if you have them.
42
+ For embedding models, free-tier options are available from Cohere, Voyage,
43
+ HuggingFace and others
44
+ Complete the following steps to fully develop, deploy, and evaluate your application:
45
+
46
+ Environment and Reproducibility
47
+ ○ Create a virtual environment (e.g., venv, conda).
48
+ ○ List dependencies in requirements.txt (or environment.yml).
49
+ ○ Provide a README.md with setup + run instructions.
50
+ ○ Set fixed seeds where/if applicable (for deterministic chunking or
51
+ evaluation sampling).
52
+ Ingestion and Indexing
53
+ ○ Parse & clean documents (handle PDFs/HTML/md/txt).
54
+ ○ Chunk documents (e.g., by headings or token windows with overlap).
55
+ ○ Embed chunks with a free embedding model or a free-tier API.
56
+ ○ Store the embedded document chunks in a local or lightweight vector
57
+ database (e.g. Chroma or an optionally cloud-hosted vector store like
58
+ Pinecone, etc.)
59
+ ○ Store vectors in a local/vector DB or cloud DB (e.g., Chroma, Pinecone, etc.)
60
+ Retrieval and Generation (RAG)
61
+ ○ To build your RAG pipeline you may use frameworks such as LangChain to
62
+ handle retrieval, prompt chaining, and API calls, or implement these
63
+ manually.
64
+ ○ Implement Top-k retrieval with optional re-ranking.
65
+ ○ Build a prompting strategy that injects retrieved chunks (and
66
+ citations/sources) into the LLM context.
67
+ ○ Add basic guardrails:
68
+ ■ Refuse to answer outside the corpus (“I can only answer about our
69
+ policies”),
70
+ ■ Limit output length,
71
+ ■ Always cite source doc IDs/titles for answers.
72
+ Web Application
73
+ ○ Students can use Flask, Streamlit or alernative for the Web app. LangChain
74
+ is recommended for orchestration, but is optional.
75
+ ○ Endpoints/UI:
76
+ ■ / - Web chat interface (text box for user input)
77
+ ■ /chat - API endpoint that receives user questions (POST) and returns
78
+ model-generated answers with citations and snippets (link to source
79
+ and show snippet).
80
+ ■ /health - returns simple status via JSON.
81
+ Deployment
82
+ ○ For production hosting use Render or Railway free tiers; students may
83
+ alternatively use any other free-tier providers of their choice.
84
+ ○ Configure environment variables (e.g. API keys, model endpoints, DB
85
+ related etc.).
86
+ ○ Ensure the app is publicly accessible at a shareable URL.
87
+ CI/CD
88
+ ○ Minimal automated testing is sufficient for this assignment (a build/run
89
+ check, optional smoke test).
90
+ ○ Create a GitHub Actions workflow that on push/PR :
91
+ ■ Installs dependencies,
92
+ ■ Runs a build/start check (e.g., python -m pip install -r
93
+ requirements.txt and python -c "import app" or pytest -q if you add
94
+ tests),
95
+ ■ On success in main, deploy to your host (Render/Railway action or
96
+ via webhook/API).
97
+ Evaluation of the LLM Application
98
+ ○ Provide a small evaluation set of 15–30 questions covering various policy
99
+ topics (PTO, security, expense, remote work, holidays, etc.). Report:
100
+ ■ Answer Quality (required):
101
+ 1. Groundedness: % of answers whose content is factually
102
+ consistent with and fully supported by the retrieved
103
+ evidence—i.e., the answer contains no information that is
104
+ absent or contradicted in the context.
105
+ Citation Accuracy: % of answers whose listed citations
106
+ correctly point to the specific passage(s) that support the
107
+ information stated—i.e., the attribution is correct and not
108
+ misleading.
109
+ Exact/Partial Match (optional): % of answers that exactly or
110
+ partially match a short gold answer you provide.
111
+ ■ System Metrics (required):
112
+ Latency (p50/p95) from request to answer for 10–20 queries.
113
+ ■ Ablations (optional): compare retrieval k, chunk size, or prompt
114
+ variants.
115
+ Design Documentation
116
+ ○ Briefly justify design choices (embedding model, chunking, k, prompt
117
+ format, vector store).
118
+ Submission Guidelines
119
+
120
+ Your final submission should consist of two links:
121
+ ● A link to an accessible software repository (a GitHub repo) containing all your
122
+ developed code. You must share your repository with the GitHub account,
123
+ quantic-grader.
124
+ o The GitHub repository should include a link to the deployed version of
125
+ your RAG LLM-based application (in file deployed.md)
126
+ o The GitHub repository must include a README.md file indicating setup and
127
+ run instructions
128
+ o The GitHub repository must also include a brief design and evaluation
129
+ document (design-and-evaluation.md) listing and explaining:
130
+ i) design and architecture decisions made - and why they were made,
131
+ including technology choices
132
+ ii) summary of your evaluation of your RAG system
133
+ ● A link to a recorded screen-share demonstration video of the working RAG
134
+ LLM-based application, involving screen capture of it being used with voiceover
135
+ o All group members must speak and be present on camera.
136
+ o All group members must show their government ID.
137
+ o The demonstration/presentation should be between 5 and 10 minutes long.
138
+ To submit your project, please click on the "Submit Project" button on your dashboard
139
+ and follow the steps provided. If you are submitting your project as a group, please
140
+ ensure only ONE member submits on behalf of the group. Please reach out to
141
+ [email protected] if you have any questions. Project grading typically takes
142
+
143
+ about 3-4 weeks to complete after the submission due date. There is no score penalty
144
+ for projects submitted after the due date, however grading may be delayed.
145
+
146
+ Plagiarism Policy
147
+
148
+ Here at Quantic, we believe that learning is best accomplished by “doing”—this ethos
149
+ underpinned the design of our active learning platform, and it likewise informs our
150
+ approach to the completion of projects and presentations for our degree programs. We
151
+ expect that all of our graduates will be able to deploy the concepts and skills they’ve
152
+ learned over the course of their degree, whether in the workplace or in pursuit of
153
+ personal goals, and so it is in our students’ best interest that these assignments be
154
+ completed solely through their own efforts with academic integrity.
155
+ Quantic takes academic integrity very seriously—we define plagiarism as: “Knowingly
156
+ representing the work of others as one’s own, engaging in any acts of plagiarism, or
157
+ referencing the works of others without appropriate citation.” This includes both misusing
158
+ or not using proper citations for the works referenced, and submitting someone else’s
159
+ work as your own. Quantic monitors all submissions for instances of plagiarism and all
160
+ plagiarism, even unintentional, is considered a conduct violation. If you’re still not sure
161
+ about what constitutes plagiarism, check out this two-minute presentation by our
162
+ librarian, Kristina. It is important to be conscientious when citing your sources. When in
163
+ doubt, cite! Kristina outlines the basics of best citation practices in this one-minute video.
164
+ You can also find more about our plagiarism policy here.
165
+
166
+ Project Rubric
167
+ Scores 2 and above are considered passing. Students who receive a 1 or 0 will not get
168
+ credit for the assignment and must revise and resubmit to receive a passing grade.
169
+ Score Description
170
+
171
+ 5
172
+ ● Addresses ALL of the project requirements, but not limited to:
173
+ ○ Outstanding RAG application with correct responses with matching
174
+ citations, ingest and indexing works
175
+ ○ Excellent, well-structured application architecture
176
+ ○ Public deployment on Render, Railway (or equivalent) fully functional
177
+ ○ CI/CD runs on push/PR and deploys on success
178
+ ○ Excellent documentation of design choices.
179
+ ○ Excellent evaluation results, which includes groundedness, citation
180
+ accuracy, and latency
181
+ ○ Excellent, clear demo of features, design and evaluation
182
+ 4
183
+ ● Addresses MOST of the project requirements, but not limited to:
184
+ ○ Excellent RAG application with correct responses with generally
185
+ matching citations, ingest and indexing works
186
+ ○ Very good, well-structured application architecture
187
+ ○ Public deployment on Render, Railway (or equivalent) almost fully
188
+ functional
189
+ ○ CI/CD runs on push/PR and deploys on success
190
+ ○ Very good documentation of design choices.
191
+ ○ Very good evaluation results which includes groundedness, citation
192
+ accuracy, and latency
193
+ ○ Very good, clear demo of features, design and evaluation
194
+ 3
195
+ ● Addresses SOME of the project requirements, but not limited to:
196
+ ○ Very good RAG application with mainly correct responses with
197
+ generally matching citations, ingest and indexing works
198
+ ○ Good, well-structured application architecture
199
+ ○ Public deployment on Render, Railway (or equivalent) almost fully
200
+ functional
201
+ ○ CI/CD runs on push/PR and deploys on success
202
+ ○ Good documentation of design choices.
203
+ ○ Good evaluation results which includes most of groundedness,
204
+ citation accuracy, and latency
205
+ ○ Good, clear demo of features, design and evaluation.
206
+ 2
207
+ ● Addresses FEW of the project requirements, but not limited to:
208
+ ○ Passable RAG application with limited correct responses with few
209
+ matching citations, ingest and indexing works partially
210
+ ○ Passable application architecture
211
+ ○ Public deployment on Render, Railway (or equivalent) not fully
212
+ functional
213
+ ○ CI/CD runs on push/PR and deploys on success
214
+ ○ Passable documentation of design choices.
215
+ ○ Passable evaluation results which includes only some of
216
+ groundedness, citation accuracy, and latency
217
+ ○ Passable demo of features, design and evaluation
218
+ 1
219
+ ● Addresses the project but MOST of the project requirements are missing,
220
+ but not limited to:
221
+ ○ Incomplete app; not deployed,
222
+ ○ No CI/CD,
223
+ ○ No to very limited evaluation
224
+ ○ No design documentation
225
+ ○ No demo of application
226
+ 0
227
+ ● The student either did not complete the assignment, plagiarized all or part
228
+ of the assignment, or completely failed to address the project requirements.
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Flask
2
+ pytest