Skip to main content
Cloud / Azure / Products / Azure Open Datasets - Curated Datasets for Machine Learning and Data Analysis

Azure Open Datasets - Curated Datasets for Machine Learning and Data Analysis

Azure Open Datasets provides curated, cleaned public datasets for machine learning, data science, and analytics in Azure.

ai-machine-learning
Pricing Model Free (storage/compute only)
Availability Global
Data Sovereignty Data location varies
Reliability No SLA SLA

Azure Open Datasets on Microsoft Azure

Azure Open Datasets is a collection of curated public datasets specifically optimized for machine learning and data analytics in Azure. The datasets cover areas such as weather, census data, holidays, public safety, and more.

Unlike raw public data sources, Azure Open Datasets are cleaned, normalized, and stored in Azure-optimized formats (Parquet). The datasets are directly usable in Azure Machine Learning, Databricks, Synapse Analytics, and other Azure services without separate download or transformation steps.

Use of the datasets themselves is free. Costs only arise for Azure services like compute or storage used for processing.

Typical Use Cases

ML model training: Using weather data, demographics, or public transportation data to enrich own ML models for better predictions.

Data science prototyping: Quick start into data science projects with immediately available, clean datasets without lengthy data acquisition.

Feature engineering: Enrichment of own business data with external factors like weather, holidays, or demographic information.

Education and research: Use of real, large datasets for academic projects, courses, and research work.

Frequently Asked Questions about Azure Open Datasets

Which datasets are available?

Azure Open Datasets includes over 50 datasets, including NOAA Weather Data, US Census, Public Holidays, NYC Taxi Trips, COVID-19 Data Lake, Genomics Data, and many more. The complete list is available in the documentation.

How is this different from public data sources?

Azure Open Datasets are cleaned, normalized, and stored in cloud-optimized formats. They are directly accessible via Azure SDKs and services without downloads or separate ETL processes. Additionally, many datasets are automatically updated.

Can I contribute my own datasets?

Currently, own datasets cannot be added to Azure Open Datasets. For own public datasets, Azure Storage with Public Access or Azure Data Share should be used.

In which formats are the data available?

Most datasets are stored in Parquet format, which offers optimal performance in Azure. Some are also available as CSV. Data can be accessed via Azure Storage Blob APIs, Python SDK, or directly from Azure ML/Databricks.

Are there usage restrictions?

The datasets are freely available for research, development, and commercial use. Specific licenses vary by dataset. Rate limits or quotas don’t exist, but Azure service limits (e.g., Storage Requests) apply.

How often are datasets updated?

Update frequency varies: Weather data is updated daily, census data upon new releases, other datasets depending on availability of source data. Documentation provides information about update frequencies.

Can I use the datasets outside of Azure?

Yes, the datasets are accessible via public Azure Storage URLs and can also be downloaded and used outside Azure. However, use within Azure offers performance advantages through data locality.

Alternatives

alternatives:

  • provider: “aws” product: “open-data”
  • provider: “gcp” product: “public-datasets”

Integration with innFactory

As a Microsoft Solutions Partner, innFactory supports you in data science and ML projects with Azure Open Datasets. We help with data integration, feature engineering, and building ML pipelines.

Contact us for a non-binding consultation on Azure Open Datasets and data science on Azure.

Typical Use Cases

Training machine learning models
Data science experiments and prototyping
Enrichment of own data
Education and research projects

Microsoft Solutions Partner

innFactory is a Microsoft Solutions Partner. We provide expert consulting, implementation, and managed services for Azure.

Microsoft Solutions Partner Microsoft Data & AI

Ready to start with Azure Open Datasets - Curated Datasets for Machine Learning and Data Analysis?

Our certified Azure experts help you with architecture, integration, and optimization.

Schedule Consultation