OpenAQ is the largest open-source air quality data platform, aggregating and harmonizing historical and real-time air quality data from diverse sources from across the globe.
Sources
OpenAQ is an aggregator of many sources of air quality data. The platform maintains a reference to the original source of the data we collect. Some sources may provide an additional readme text file with additional metadata about the source.
Implementation
OpenAQ aggregates air quality data from disparate sources around the world to provide access of these sources in a single location. OpenAQ uses an ETL (Extract-Transform-Load) process to ingest and harmonize air quality data. The data process has four main components: fetch, storage, presentation and archive.
API Core Concepts
Fetch: OpenAQ fetch scripts range from HTML scrapers, FTP directory scanners to REST API scrapers depending on the source of the data. Storage: Data is stored in a PostgreSQL database using the TimeScale extension for added time series functionality and PostGIS for geospatial functionality. Presentation: OpenAQ provides a REST API for programmatic access of the database. Archive: Data is stored in a publicly available AWS S3 bucket via the Open Data on AWS program.