Gokul Soumya commited on
Commit
6f0237b
·
1 Parent(s): 3ec10ce

Initial commit

Browse files
app.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
3
+ import gradio as gr
4
+
5
+ tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER")
6
+ model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")
7
+ ner_pipeline = pipeline(
8
+ "ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple"
9
+ )
10
+
11
+ samples_dir = Path("samples")
12
+ samples = [
13
+ "basic.txt",
14
+ "single-names-and-initials.txt",
15
+ "false-positive.txt",
16
+ "uncased-names.txt",
17
+ ]
18
+ examples = [(samples_dir / sample).read_text().strip() for sample in samples]
19
+ example_labels = [
20
+ sample.replace(".txt", "").replace("-", " ").title() for sample in samples
21
+ ]
22
+
23
+
24
+ def ner(text):
25
+ output = ner_pipeline(text)
26
+ output = [e for e in output if e["entity_group"] == "PER" and e["score"] > 0.90]
27
+ output = [{**e, "entity_group": "PERSON"} for e in output]
28
+ return {"text": text, "entities": output}
29
+
30
+
31
+ demo = gr.Interface(
32
+ ner,
33
+ gr.Textbox(placeholder="Enter sentence here..."),
34
+ gr.HighlightedText(combine_adjacent=True, show_legend=True),
35
+ examples=examples,
36
+ example_labels=example_labels,
37
+ )
38
+
39
+ demo.launch(debug=True)
pyproject.toml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "basic-name-recognition"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.12"
7
+ dependencies = [
8
+ "gradio>=5.46.0",
9
+ "transformers>=4.56.1",
10
+ ]
11
+
12
+ [tool.pyrefly]
13
+ disable-type-errors-in-ide = false
samples/basic.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ At last week’s quarterly strategy review, Rohit Menon presented the financial projections for the Asia-Pacific division, supported by inputs from Claire O’Donnell in Sydney. The legal team, led by Ananya Krishnaswamy-Rao, has begun drafting new compliance guidelines to align with the Reserve Bank of India’s directives.
2
+
3
+ During the leadership offsite, Lachlan McAllister and Priya Kapoor facilitated a workshop on sustainable procurement. Their session was well received, with contributions from Deepak Varadarajan, Sophie-Anne Kavanagh, and Aishwarya Nandakumar.
4
+
5
+ On the technology side, Vikramjeet Singh Gill collaborated closely with Ethan Browne to roll out the updated CRM platform. Special mention was given to Meera Iyer, who managed stakeholder expectations across three regions. Meanwhile, Shane Callaghan and Arjun Deshpande coordinated vendor negotiations, ensuring cost savings without compromising service levels.
6
+
7
+ Finally, HR announced that Ritika Dasgupta will replace Benjamin O’Keefe as the new Head of Talent Development, effective from November. A formal handover plan is being developed under the supervision of Nivedita Choudhury and Caleb Fraser-Jones.
samples/false-positive.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ The Global Vision 2030 Report was reviewed by the Strategy Office last week. Input came from the Future Growth Taskforce and the Emerging Markets Division. Key highlights referenced Aurora Project, Phoenix Initiative, and the Blue Ocean Framework, each of which is central to the NextGen Transformation Agenda.
2
+
3
+ Feedback was collected through the Customer First Survey, with high response rates from India South Cluster, Australia East Zone, and the Pacific Gateway Region. The Leadership Excellence Program (LEP) was mentioned several times, along with the STAR Model for performance evaluation.
4
+
5
+ The compliance update noted that SAP Hana, Oracle Fusion, and Atlas Cloud had all passed preliminary security reviews. A section on innovation praised the Helios Lab in Bengaluru and the Sunrise Centre in Melbourne, while also mentioning the Harbour Bridge Pilot as a potential template for future rollouts.
6
+
7
+ The HR section included references to the Talent360 Portal, VisionX Mentorship Pathway, and the Career Compass Initiative. Training modules such as LEAD, INSPIRE, and IGNITE were rolled out in partnership with the Future of Work Institute.
8
+
9
+ Finally, the risk management appendix listed Delta One, Omega Shield, and Project Horizon as ongoing initiatives to be monitored closely in Q4.
samples/single-names-and-initials.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ The governance committee met last Friday under the chairmanship of R. Venkatesh (CFO, South Asia). Minutes were recorded by Amelia-Jayne O’Rourke, who also flagged pending action items from K. Shanmugam Pillai in Chennai.
2
+
3
+ As part of the global mobility initiative, Devika (known professionally by her first name only) will relocate to Melbourne to join Harper Singh-D’Souza and James O’Connor in the regional HR hub. Their transition plan was endorsed by the Anita Thomas Leadership Institute in partnership with Rajiv Malhotra Associates.
4
+
5
+ During the compliance audit, lead reviewer N. Karthikeyan worked closely with Olivia Grace Macpherson, who noted discrepancies in the filings prepared by Ashok Kumar. Interestingly, one supplier contract was co-signed by S. R. Iyer and David-Lee Johnson, both of whom are based in Brisbane.
6
+
7
+ The R&D innovation sprint featured contributions from Bhavani Narayanaswamy, A. J. Patel, and Chloe-Anne McPherson. A keynote was delivered by Professor Meenakshi Sundaram, while the closing panel was moderated by Tahlia De’Luca and Ramesh Srinivasan.
8
+
9
+ Finally, an internal newsletter highlighted the long service awards for Mr. Joseph Arul, Anushka, and Dr. Patrick O’Keefe, with a special note of appreciation for P. K. Subramanian who retires this quarter after 32 years with the company.
samples/uncased-names.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The quarterly update was authored by senior analyst rajiv malhotra, but his name appeared in the email footer only as rajiv m. The document also referenced the new compliance memo drafted by dr meenakshi sundaram, although it was buried in a footnote labeled “see appendix d, sundaram, m.”
2
+
3
+ During the technology review, stephen o’connor presented slides, but his name was split across two lines in the PDF:
4
+
5
+ stephen o’
6
+
7
+ connor (regional IT lead)
8
+
9
+ — which some systems misread as formatting noise.
10
+
11
+ The HR team confirmed that aishwarya n. kumar and ben lee will co-lead the “Talent360” rollout. In the report, however, her name appeared once as a. n. kumar and his was shortened to b.lee in the meeting notes.
12
+
13
+ Meanwhile, a budget section casually mentioned that “as discussed with claire odonnell and harper singh d’souza,” costs could be reduced by 12%. Since both names were embedded mid-sentence without commas or titles, they were easy to miss.
14
+
15
+ In a scanned annexure, the long-service award was credited to mr joseph arul, but OCR rendered it as mr.josepharu1 (with a “1” instead of “l”). Another award went to devika, though her single-name reference was mistaken for a project codename in one extract.
16
+
17
+
uv.lock ADDED
The diff for this file is too large to render. See raw diff