CDC Protect · Data Analyzation to Save Lives

As part of the CDC Protect team, I built robust web scrapers and data pipelines to analyze cannabinoid data, focusing on the visual similarity between edible cannabis packaging and children's candy packaging. The project highlighted risks of confusing designs and the need for stronger regulation.

Highlights

Designed a recursive scraper (100+ layers) in Python using BeautifulSoup, Requests, and regex to gather structured/unstructured hospital website data.
Developed URL management and retry/error-handling logic for resilience and efficiency.
Automated distributed scraping with Slurm for HPC-scale data collection.
Built dual scraping pipelines (Selenium for dynamic sites, BeautifulSoup for static sites).
Created CNNs to compare edible packaging with children’s candy, achieving +15% accuracy improvement.
Performed statistical analysis on THC concentrations with Pandas and visualized insights via treemaps, bar charts, and histograms.

Stack

Python BeautifulSoup Selenium Requests Regex Pandas TensorFlow/CNN Slurm GCP Data Visualization

GitHub (Private) Details