Data Census

Consumer Price Index (CPI)

The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. Indexes are available for the U.S. and various geographic areas. Average price data for select utility, automotive fuel, and food items are also available. All data is available in flat files and through the Bureau of Labor Statistics API.

This large data set is segmented into four groups:

All Urban Consumers (Current Series)
Urban Wage Earners and Clerical Workers (Current Series)
All Urban Consumers (Chained CPI)
Average Price Data

import requests
import pandas as pd

# All items in U.S. city average, all urban consumers, seasonally adjusted
response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/CUSR0000SA0') 
data = response.json()
df = pd.DataFrame(data['Results']['series'][0]['data'])
df.value = df.value.astype('float')

# See the average price by year
print(df.groupby('year')['value'].mean())

library(dplyr)
library(jsonlite)

# All items in U.S. city average, all urban consumers, seasonally adjusted
data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/CUSR0000SA0") 
df <- data[["Results"]][["series"]][["data"]][[1]]
df$value <- as.numeric(df$value)

# See the average price by year
df %>%
  group_by(year) %>%
  summarise(avg_rate = mean(value))

Labor Force Statistics (CPS)

This data set comes from the Current Population Survey (CPS), a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. This large dataset has data for the years 1995-1999, as well as 2002-2017. All data is available in HTML, PDF, and XLSX flat formats, as well as through the Bureau of Labor Statistics API.

The 57 data tables are grouped together in the following catagories:

A full list of tables and variables for the Current Population Survey can be found here.

import requests
import pandas as pd

# Seasonally Adjusted Unemployment Rate 
response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/LNS14000000') 
data = response.json()
df = pd.DataFrame(data['Results']['series'][0]['data'])
df.value = df.value.astype('float')

# See the average rate by year
print(df.groupby('year')['value'].mean())

library(dplyr)
library(jsonlite)

# Seasonally Adjusted Unemployment Rate 
data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/LNS14000000") 
df <- data[["Results"]][["series"]][["data"]][[1]]
df$value <- as.numeric(df$value)

# See the average rate by year
df %>%
  group_by(year) %>%
  summarise(avg_rate = mean(value))

Labor Productivity and Costs (BLS)

Labor productivity is a measure of economic performance that compares the amount of goods and services produced (output) with the number of hours worked to produce those goods and services. The BLS also publishes measures of multifactor productivity.

The data is organized into two separate databases - Major Sector Productivity and Costs and Industry Productivity. Both databases are available as flat files and through the Bureau of Labor Statistics API.

import requests
import pandas as pd

# Office of Productivity And Technology and Percent/Rate/Ratio and Productivity : Nonfarm Business
response = requests.get('https://api.bls.gov/publicAPI/v2/timeseries/data/PRS85006092') 
data = response.json()
df = pd.DataFrame(data['Results']['series'][0]['data'])
df.value = df.value.astype('float')

# See the rate change by quarter 
print(df.sort_values(['year', 'period'])[['year', 'period', 'value']])

library(dplyr)
library(jsonlite)

# Office of Productivity And Technology and Percent/Rate/Ratio and Productivity : Nonfarm Business
data <- fromJSON("https://api.bls.gov/publicAPI/v2/timeseries/data/PRS85006092") 
df <- data[["Results"]][["series"]][["data"]][[1]]
df$value <- as.numeric(df$value)

# See the rate change by quarter 
print(df[order(df$year, df$period), c("year", "period", "value")])

Small Business Administration Survey (1992)

The Small Business Administration Survey records general charactersitics of small businesses in the United States, such as the number of employees, industry, number of locations, paid wages, etc. It also considers the demographic information of owners, such as marital status and ethnicity. The data provided below is for the year of 1992.

Survey 1 Download | Documentation

Survey 2 Download | Documentation

import pandas as pd
df = pd.read_table('sbaraw-s1.dta')

df <- pd.read_table("sbaraw-s1.dta")

American Time Use Survey

Researchers can produce their own time-use estimates using the ATUS microdata files. The ATUS data files include information for over 190,000 respondents total from 2003 to 2017. Because of the size of these data files, it is easiest to work with them using statistical software such as Stata, SAS, or SPSS.

The survey is sponsored by the Bureau of Labor Statistics and is conducted by the U.S. Census Bureau.

The major purpose of ATUS is to develop nationally representative estimates of how people spend their time. The survey also provides information on the amount of time people spend in many other activities, such as religious activities, socializing, exercising, and relaxing. Demographic information such as sex, race, age, educational attainment, etc. is also available for each respondent.

Microdata | Data Dictionary | User Guide

import pandas as pd
    
mapping = {1: 'New England'
         , 2: 'Middle Atlantic'
         , 3: 'East North Central'
         , 4: 'West North Central'
         , 5: 'South Atlantic'
         , 6: 'East South Central'
         , 7: 'West South Central'
         , 8: 'Mountain'
         , 9: 'Pacific'}

df = pd.read_table('atuscps_2017.dat', delimiter=',')
df['division'] = df['GEDIV'].map(mapping)

# See number of housing units by geographic division.
print(pd.crosstab(df.division, df.HEHOUSUT))

df <- read.csv("atuscps_2017.dat")
df$GEDIV <- factor(df$GEDIV)
levels(df$GEDIV) <- c("New England"
                   ,  "Middle Atlantic"
                   ,  "East North Central"
                   ,  "West North Central"
                   ,  "South Atlantic"
                   ,  "East South Central"
                   ,  "West South Central"
                   ,  "Mountain"
                   ,  "Pacific")

# See number of housing units by geographic division.
table(df$GEDIV, df$HEHOUSUT)