Back to Projects
StreamlitBeautifulSoupspaCyNLTKWeb Scraping

Citation Generator

Built a lightweight citation generator that scrapes article metadata and uses NLP to extract author/date/title, then outputs an APA-style citation and quick semantic keyword analysis.

Citation Generator

Overview

A small web tool that takes an article URL and generates an APA-style citation by scraping metadata and applying NLP fallbacks when fields are missing. It also includes a semantic analysis mode that surfaces the most frequent meaningful keywords from the page.

What I Built

  • URL ingestion + scraping with headers to fetch article HTML and extract page text
  • Metadata parsing for title/author when available (e.g. meta[name="author"], <title>)
  • NLP fallback extraction using spaCy to detect likely PERSON (author) and DATE entities
  • Keyword analysis using NLTK stopword filtering + frequency counts (top 3 keywords)
  • Streamlit UI with two modes:
    • Generate citation (editable fields for title/author/date)
    • Semantic analysis (keywords + counts)

Tech Stack

  • Python
  • Streamlit
  • BeautifulSoup + Requests
  • spaCy
  • NLTK
  • Pandas (utility / data handling)

Results

  • Generates a usable citation workflow from a single URL with editable extracted fields
  • Provides quick semantic signal via top keywords to summarize page content at a glance

Links

  • GitHub: https://github.com/KanavAtre/MLA_Citation_Gnerator

Published: March 1, 2025