[英]Creating dataset in R or python using json api
如何使用 json Z8A5DA52ED126447D359E70C057 在 python 或 R 中创建具有适当列名的数据集:
基于 R 的回复:您可以使用jsonlite
package:
library(jsonlite)
data <- fromJSON("./data/data.json", flatten = FALSE)
我将您问题中的 json 文件保存到./data/data.json
。 这将生成一个列表:
List of 3
$ cases_time_series:'data.frame': 104 obs. of 7 variables:
..$ dailyconfirmed: chr [1:104] "1" "0" "0" "1" ...
..$ dailydeceased : chr [1:104] "0" "0" "0" "0" ...
..$ dailyrecovered: chr [1:104] "0" "0" "0" "0" ...
..$ date : chr [1:104] "30 January " "31 January " "01 February " "02 February " ...
..$ totalconfirmed: chr [1:104] "1" "1" "1" "2" ...
..$ totaldeceased : chr [1:104] "0" "0" "0" "0" ...
..$ totalrecovered: chr [1:104] "0" "0" "0" "0" ...
$ statewise :'data.frame': 38 obs. of 11 variables:
..$ active : chr [1:38] "47598" "18381" "5121" "6523" ...
..$ confirmed : chr [1:38] "74925" "24427" "8904" "8718" ...
..$ deaths : chr [1:38] "2436" "921" "537" "61" ...
..$ deltaconfirmed : chr [1:38] "595" "0" "0" "0" ...
..$ deltadeaths : chr [1:38] "21" "0" "0" "0" ...
..$ deltarecovered : chr [1:38] "434" "0" "0" "0" ...
..$ lastupdatedtime: chr [1:38] "13/05/2020 11:54:23" "12/05/2020 22:13:24" "12/05/2020 20:16:23" "12/05/2020 22:48:24" ...
..$ recovered : chr [1:38] "24887" "5125" "3246" "2134" ...
..$ state : chr [1:38] "Total" "Maharashtra" "Gujarat" "Tamil Nadu" ...
..$ statecode : chr [1:38] "TT" "MH" "GJ" "TN" ...
..$ statenotes : chr [1:38] "" "[10-May]<br>\n- Total numbers are updated to the final figure reported for 10th May. <br>\n- 665 cases added by"| __truncated__ "" "" ...
$ tested :'data.frame': 65 obs. of 11 variables:
..$ individualstestedperconfirmedcase: chr [1:65] "75.64102564" "81.56666667" "73.96428571" "72.99450549" ...
..$ positivecasesfromsamplesreported : chr [1:65] "" "" "" "" ...
..$ samplereportedtoday : chr [1:65] "" "" "" "" ...
..$ source : chr [1:65] "Press_Release_ICMR_13March2020.pdf" "ICMR_website_update_18March_6PM_IST.pdf" "ICMR_website_update_19March_10AM_IST_V2.pdf" "ICMR_website_update_19March_6PM_IST.pdf" ...
..$ testpositivityrate : chr [1:65] "1.32%" "1.23%" "1.35%" "1.37%" ...
..$ testsconductedbyprivatelabs : chr [1:65] "" "" "" "" ...
..$ testsperconfirmedcase : chr [1:65] "83.33333333" "87.5" "79.26190476" "77.88461538" ...
..$ totalindividualstested : chr [1:65] "5900" "12235" "12426" "13285" ...
..$ totalpositivecases : chr [1:65] "78" "150" "168" "182" ...
..$ totalsamplestested : chr [1:65] "6500" "13125" "13316" "14175" ...
..$ updatetimestamp : chr [1:65] "13/03/2020 00:00:00" "18/03/2020 18:00:00" "19/03/2020 10:00:00" "19/03/2020 18:00:00" ...
您可以将此列表转换为一个或多个数据框。 您不能使用dplyr
function bind_rows
因为您的列表元素都是不同的; 他们有不同的列数和行数。 如果它们有共同的字段,您可以使用join
函数将数据框合并在一起。
对此进行扩展:第一个列表元素cases
可以轻松拆分并处理为图形:
library(jsonlite)
library(ggplot2)
library(dplyr)
data <- fromJSON("./data/data.json", flatten = FALSE)
cases <- data[[1]] %>%
mutate(date = as.Date(date, format = "%d %B")) %>%
mutate_if(is.character, as.numeric)
ggplot(data = cases, aes(x = date, y = dailyconfirmed)) +
geom_line()
有了这个结果:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.