The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. Indexes are available for the U.S. and various geographic areas. Average price data for select utility, automotive fuel, and food items are also available. All data is available in flat files and through the Bureau of Labor Statistics API.
This large data set is segmented into four groups:
Example in Python
import requests import pandas as pd # All items in U.S. city average, all urban consumers, seasonally adjusted response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/CUSR0000SA0') data = response.json() df = pd.DataFrame(data['Results']['series'][0]['data']) df.value = df.value.astype('float') # See the average price by year print(df.groupby('year')['value'].mean())
Example in R
library(dplyr) library(jsonlite) # All items in U.S. city average, all urban consumers, seasonally adjusted data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/CUSR0000SA0") df <- data[["Results"]][["series"]][["data"]][[1]] df$value <- as.numeric(df$value) # See the average price by year df %>% group_by(year) %>% summarise(avg_rate = mean(value))
This data set comes from the Current Population Survey (CPS), a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. This large dataset has data for the years 1995-1999, as well as 2002-2017. All data is available in HTML, PDF, and XLSX flat formats, as well as through the Bureau of Labor Statistics API.
The 57 data tables are grouped together in the following catagories:
A full list of tables and variables for the Current Population Survey can be found here.
import requests import pandas as pd # Seasonally Adjusted Unemployment Rate response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/LNS14000000') data = response.json() df = pd.DataFrame(data['Results']['series'][0]['data']) df.value = df.value.astype('float') # See the average rate by year print(df.groupby('year')['value'].mean())
library(dplyr) library(jsonlite) # Seasonally Adjusted Unemployment Rate data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/LNS14000000") df <- data[["Results"]][["series"]][["data"]][[1]] df$value <- as.numeric(df$value) # See the average rate by year df %>% group_by(year) %>% summarise(avg_rate = mean(value))
Labor productivity is a measure of economic performance that compares the amount of goods and services produced (output) with the number of hours worked to produce those goods and services. The BLS also publishes measures of multifactor productivity.
The data is organized into two separate databases - Major Sector Productivity and Costs and Industry Productivity. Both databases are available as flat files and through the Bureau of Labor Statistics API.
import requests import pandas as pd # Office of Productivity And Technology and Percent/Rate/Ratio and Productivity : Nonfarm Business response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/PRS85006092') data = response.json() df = pd.DataFrame(data['Results']['series'][0]['data']) df.value = df.value.astype('float') # See the rate change by quarter print(df.sort_values(['year', 'period'])[['year', 'period', 'value']])
library(dplyr) library(jsonlite) # Office of Productivity And Technology and Percent/Rate/Ratio and Productivity : Nonfarm Business data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/PRS85006092") df <- data[["Results"]][["series"]][["data"]][[1]] df$value <- as.numeric(df$value) # See the rate change by quarter print(df[order(df$year, df$period), c("year", "period", "value")])
The Small Business Administration Survey records general charactersitics of small businesses in the United States, such as the number of employees, industry, number of locations, paid wages, etc. It also considers the demographic information of owners, such as marital status and ethnicity. The data provided below is for the year of 1992.
Survey 1 Download | Documentation
Survey 2 Download | Documentation
import pandas as pd df = pd.read_table('sbaraw-s1.dta')
df <- pd.read_table("sbaraw-s1.dta")
Researchers can produce their own time-use estimates using the ATUS microdata files. The ATUS data files include information for over 190,000 respondents total from 2003 to 2017. Because of the size of these data files, it is easiest to work with them using statistical software such as Stata, SAS, or SPSS.
The survey is sponsored by the Bureau of Labor Statistics and is conducted by the U.S. Census Bureau.
The major purpose of ATUS is to develop nationally representative estimates of how people spend their time. The survey also provides information on the amount of time people spend in many other activities, such as religious activities, socializing, exercising, and relaxing. Demographic information such as sex, race, age, educational attainment, etc. is also available for each respondent.
Microdata | Data Dictionary | User Guide
import pandas as pd mapping = {1: 'New England' , 2: 'Middle Atlantic' , 3: 'East North Central' , 4: 'West North Central' , 5: 'South Atlantic' , 6: 'East South Central' , 7: 'West South Central' , 8: 'Mountain' , 9: 'Pacific'} df = pd.read_table('atuscps_2017.dat', delimiter=',') df['division'] = df['GEDIV'].map(mapping) # See number of housing units by geographic division. print(pd.crosstab(df.division, df.HEHOUSUT))
df <- read.csv("atuscps_2017.dat") df$GEDIV <- factor(df$GEDIV) levels(df$GEDIV) <- c("New England" , "Middle Atlantic" , "East North Central" , "West North Central" , "South Atlantic" , "East South Central" , "West South Central" , "Mountain" , "Pacific") # See number of housing units by geographic division. table(df$GEDIV, df$HEHOUSUT)