Back to Projects
StreamlitBeautifulSoupspaCyNLTKWeb Scraping
Citation Generator
Built a lightweight citation generator that scrapes article metadata and uses NLP to extract author/date/title, then outputs an APA-style citation and quick semantic keyword analysis.

Overview
A small web tool that takes an article URL and generates an APA-style citation by scraping metadata and applying NLP fallbacks when fields are missing. It also includes a semantic analysis mode that surfaces the most frequent meaningful keywords from the page.
What I Built
- URL ingestion + scraping with headers to fetch article HTML and extract page text
- Metadata parsing for title/author when available (e.g.
meta[name="author"],<title>) - NLP fallback extraction using spaCy to detect likely PERSON (author) and DATE entities
- Keyword analysis using NLTK stopword filtering + frequency counts (top 3 keywords)
- Streamlit UI with two modes:
- Generate citation (editable fields for title/author/date)
- Semantic analysis (keywords + counts)
Tech Stack
- Python
- Streamlit
- BeautifulSoup + Requests
- spaCy
- NLTK
- Pandas (utility / data handling)
Results
- Generates a usable citation workflow from a single URL with editable extracted fields
- Provides quick semantic signal via top keywords to summarize page content at a glance
Links
- GitHub: https://github.com/KanavAtre/MLA_Citation_Gnerator