[英]Convert json file to dataframe in R
我是將json文件轉換為dataframe時R面臨的新問題。 我有如下所示的json文件:
json_file = '[{"id": "abc", "model": "honda", "date": "20190604", "cols": {"action": 15, "values": 18, "not": 29}},
{"id": "abc", "model": "honda", "date": "20190604", "cols": {"hello": 14, "hi": 85, "wow": 14}},
{"id": "mno", "model": "ford", "date": "20190604", "cols": {"yesterday": 21, "today": 21, "tomorrow": 29}},
{"id": "mno", "model": "ford", "date": "20190604", "cols": {"docs": 25, "ok": 87, "none": 42}}]'
我想將上述json文件轉換為以下格式的數據框:
預期結果
df =
id model date cols values_cols
abc honda 20190604 action 15
abc honda 20190604 values 18
abc honda 20190604 not 29
abc honda 20190604 hello 14
abc honda 20190604 hi 85
abc honda 20190604 wow 14
mno ford 20190604 yesterday 21
mno ford 20190604 today 21
mno ford 20190604 tomorrow 29
mno ford 20190604 docs 25
mno ford 20190604 ok 87
我的結果
id model date cols id.1 model.1 date.1 cols.1 id.2 model.2 date.2 cols.2 id.3 model.3 date.3 cols.3
action abc honda 20190604 15 abc honda 20190604 14 mno ford 20190604 21 mno ford 20190604 25
values abc honda 20190604 18 abc honda 20190604 85 mno ford 20190604 21 mno ford 20190604 87
not abc honda 20190604 29 abc honda 20190604 14 mno ford 20190604 29 mno ford 20190604 42
It's not correct, as it is taking as index.
我的解決方案:
require(RJSONIO)
df = fromJSON(json_file)
使用jsonlite::fromJSON
讀取數據時的問題是最后一列是數據幀,而不是原子向量。
tmp <- jsonlite::fromJSON(json_file)
str(tmp)
#'data.frame': 4 obs. of 4 variables:
# $ id : chr "abc" "abc" "mno" "mno"
# $ model: chr "honda" "honda" "ford" "ford"
# $ date : chr "20190604" "20190604" "20190604" "20190604"
# $ cols :'data.frame': 4 obs. of 12 variables:
# ..$ action : int 15 NA NA NA
# ..$ values : int 18 NA NA NA
# ..$ not : int 29 NA NA NA
# ..$ hello : int NA 14 NA NA
# ..$ hi : int NA 85 NA NA
# ..$ wow : int NA 14 NA NA
# ..$ yesterday: int NA NA 21 NA
# ..$ today : int NA NA 21 NA
# ..$ tomorrow : int NA NA 29 NA
# ..$ docs : int NA NA NA 25
# ..$ ok : int NA NA NA 87
# ..$ none : int NA NA NA 42
因此,在將數據從寬格式 cbind
為長格式之前,必須將最后一列與其他三列cbind
。
tmp <- cbind(tmp[-4], tmp[[4]])
df1 <- reshape2::melt(tmp, id.vars = c("id", "model", "date"))
names(df1)[4:5] <- c("cols", "values_cols")
df1 <- df1[complete.cases(df1), ]
row.names(df1) <- NULL
df1
# id model date cols values_cols
#1 abc honda 20190604 action 15
#2 abc honda 20190604 values 18
#3 abc honda 20190604 not 29
#4 abc honda 20190604 hello 14
#5 abc honda 20190604 hi 85
#6 abc honda 20190604 wow 14
#7 mno ford 20190604 yesterday 21
#8 mno ford 20190604 today 21
#9 mno ford 20190604 tomorrow 29
#10 mno ford 20190604 docs 25
#11 mno ford 20190604 ok 87
#12 mno ford 20190604 none 42
現在清理.GlobalEnv
。
rm(tmp) # no longer needed.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.