Methodology
Data Collection
Geographic Data Processing
- Geographic Boundary Files: Collected boundary files from the U.S. Census Bureau's TIGER/Line database (2024) for:
- Counties
- ZIP codes (ZCTA)
- Congressional districts
- Area Intersection Analysis:
- Used GIS tools to compute intersections between county, ZIP code, and congressional district boundaries
- Calculated minimum width thresholds for valid areas
- Generated unique identifiers for each intersection area
- Coordinate Generation:
- Calculated centroid coordinates for each intersection area
- Transformed coordinates from NAD83/Conus Albers (EPSG:5070) to WGS84 (EPSG:4326)
Ballot Data Collection
API Integration
- Queried ballots available at each centroid coordinate from Ballotpedia's API
- Implemented rate-limited requests
- Cached responses to minimize API calls
Data Processing
Geographic Processing
- Area Classification:
- Categorized areas by administrative boundaries
- Generated district identifiers
- Visualization:
- Generated interactive maps for split districts
- Created area-specific visualizations using Folium
- Implemented professional color schemes for district differentiation
Ballot Information Processing
- Content Structuring:
- Parsed API responses into structured format
- Organized races by jurisdiction level
- Standardized office and candidate information
- Complexity Analysis:
- Calculated ballot complexity scores based on:
- Number of races and decisions
- Information density
- Language complexity (Flesch-Kincaid Grade Level)
- Presence of non-partisan contests
- Total options per decision
Technical Implementation
Development Stack
- Python for data processing
- GeoPandas for geographic operations
- Folium for mapping visualization
- Papa Parse for CSV handling
- Showdown.js for markdown rendering
Source Code
The data processing pipeline is implemented in two main Jupyter notebooks: