I'm trying to pull some data from the US Census website, which comes in JSON. This is what it looks like:
data_from_api <- readr::read_file('https://api.census.gov/data/2016/zbp?get=ESTAB,EMPSZES,EMPSZES_TTL,ST,YEAR&for=ZIPCODE:20004')
data_from_api
Trying to use jsonlite
it looks like this
> data_from_api <- fromJSON(data_from_api)
> data_from_api
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "ESTAB" "EMPSZES" "EMPSZES_TTL" "ST" "YEAR" "zipcode"
[2,] "925" "001" "All establishments" "11" "2016" "20004"
[3,] "406" "212" "Establishments with 1 to 4 employees" "11" "2016" "20004"
[4,] "154" "220" "Establishments with 5 to 9 employees" "11" "2016" "20004"
[5,] "113" "230" "Establishments with 10 to 19 employees" "11" "2016" "20004"
[6,] "122" "241" "Establishments with 20 to 49 employees" "11" "2016" "20004"
[7,] "70" "242" "Establishments with 50 to 99 employees" "11" "2016" "20004"
[8,] "45" "251" "Establishments with 100 to 249 employees" "11" "2016" "20004"
[9,] "8" "252" "Establishments with 250 to 499 employees" "11" "2016" "20004"
[10,] "6" "254" "Establishments with 500 to 999 employees" "11" "2016" "20004"
[11,] "1" "260" "Establishments with 1,000 employees or more" "11" "2016" "20004"
Any idea why the column names are not flowing properly? Can I change any input to make it work?
Thanks
This is not because of some fault with fromJSON, it's just a matter of the randomness of JSON structures.
It's trivial to convert this to a correctly named data.frame:
colnms <- data_from_api[1,]
data_from_api <- as.data.frame(data_from_api[-1,], check.names = F, stringsAsFactors = FALSE)
names(data_from_api) <- colnms
It is delivered as a list of lists (ie, a matrix), not a dictionary (frame). To get the frame, some simple manipulation:
x <- jsonlite::fromJSON(data_from_api)
x
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "ESTAB" "EMPSZES" "EMPSZES_TTL" "ST" "YEAR" "zipcode"
# [2,] "925" "001" "All establishments" "11" "2016" "20004"
# [3,] "406" "212" "Establishments with 1 to 4 employees" "11" "2016" "20004"
# [4,] "154" "220" "Establishments with 5 to 9 employees" "11" "2016" "20004"
# [5,] "113" "230" "Establishments with 10 to 19 employees" "11" "2016" "20004"
# [6,] "122" "241" "Establishments with 20 to 49 employees" "11" "2016" "20004"
# [7,] "70" "242" "Establishments with 50 to 99 employees" "11" "2016" "20004"
# [8,] "45" "251" "Establishments with 100 to 249 employees" "11" "2016" "20004"
# [9,] "8" "252" "Establishments with 250 to 499 employees" "11" "2016" "20004"
# [10,] "6" "254" "Establishments with 500 to 999 employees" "11" "2016" "20004"
# [11,] "1" "260" "Establishments with 1,000 employees or more" "11" "2016" "20004"
colnames(x) <- x[1,]
x <- x[-1,]
x2 <- as.data.frame(x, stringsAsFactors = FALSE)
x2[c(1,2,4,5,6)] <- lapply(x2[c(1,2,4,5,6)], as.integer)
str(x2)
# 'data.frame': 10 obs. of 6 variables:
# $ ESTAB : int 925 406 154 113 122 70 45 8 6 1
# $ EMPSZES : int 1 212 220 230 241 242 251 252 254 260
# $ EMPSZES_TTL: chr "All establishments" "Establishments with 1 to 4 employees" "Establishments with 5 to 9 employees" "Establishments with 10 to 19 employees" ...
# $ ST : int 11 11 11 11 11 11 11 11 11 11
# $ YEAR : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016
# $ zipcode : int 20004 20004 20004 20004 20004 20004 20004 20004 20004 20004
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.