Welcome to Data Frog. My personal notes and learning journey for data science, machine learning, and computer programming.
Data Sourcing
When looking for data to start experimenting with, you can think of it in three categories: proprietary, public, and purchased.
Proprietary can be thought of as 'in house' data. Or data from an organization you are already a part of.
Public is a category of data that is available to anyone.
Some resources:
Data at U.S. national government level:
Data at U.S. state government level:
European data:
Non-Profit data:
Private organizations' data:
Large data:
Web Scraping and APIs
Web API's and web scraping are great for getting data as well. For scraping data the following tools are great:
- import.io
- ScraperWiki
- Tabular
- Google Sheets
- Excel
When using Google sheets, you can pull in tables very easily from around the web. Open up Google sheets, in the A1 cell, paste this type of formula (an example):
=IMPORTHTML('https://en.wikipedia.org/wiki/Iron_Chef_America', 'table', 2)