[英]How to extract values from JSON blob into new columns in R data frame?
I am stuck on the following problem: 我陷入以下问题:
I have a data frame that has a variable that contains JSON objects (in var2
): 我有一个数据框,该框的变量包含JSON对象(在
var2
):
var1 var2
1 1 {"property1": "val1", "property2": 5}
2 2 {"property1": "val2", "property2": 8}
3 3 {"property1": "val3", "property2": 7}
4 4 {"property1": "val4", "property2": 0}
5 5 {"property1": "val5", "property3": 9}
(Code on pastebin here ) ( 这里的 pastebin代码)
I want to extract the JSON properties in var2
and andd them to the data frame in new columns like so: 我想提取
var2
的JSON属性,并将它们添加到新列的数据框中,如下所示:
var1 var2 prop1 prop2 prop3
1 1 {"property1": "val1", "property2": 5} val1 5 NA
2 2 {"property1": "val2", "property2": 8} val2 8 NA
3 3 {"property1": "val3", "property2": 7} val3 7 NA
4 4 {"property1": "val4", "property2": 0} val4 0 NA
5 5 {"property1": "val5", "property2": 9} val5 NA 9
Given identical properties in identical sequence, I have found this way to make it work: 给定相同序列中的相同属性,我发现了这种方法可以使其工作:
jsonProps <- sapply(df$var2, function(x) fromJSON(x)) %>%
t() %>%
as.data.frame()
rownames(jsonProps) <- NULL
y <- cbind(df, jsonProps)
(I am happy to receive any suggestions on how to make this more efficient, if there might be any.) (很高兴收到关于如何提高效率的任何建议,如果有的话。)
This does not work anymore when 当这不再起作用
I am at a loss on how to dynmically create columns from the properties I find and transfer the property values correctly and would thus welcome your suggestions on how to tackle this. 我不知道如何动态地从找到的属性中创建列并正确传输属性值,因此欢迎您提出有关如何解决此问题的建议。
You can do: 你可以做:
library(plyr)
library(jsonlite)
ll = lapply(df$var2, function(x) jsonlite::fromJSON(as.character(x)))
cbind(df, ldply(ll, data.frame))
# var1 var2 property1 property3 property2
#1 a {"property1": "val1", "property3": 8} val1 8 NA
#2 a {"property1": "val1", "property2": 5} val1 NA 5
Data: 数据:
df = structure(list(var11 = structure(c(1L, 1L), .Label = "a", class = "factor"),
var2 = structure(1:2, .Label = c("{\"property1\": \"val1\", \"property3\": 8}",
"{\"property1\": \"val1\", \"property2\": 5}"), class = "factor")), .Names = c("var1",
"var2"), class = "data.frame", row.names = 1:2)
This doesn't do everything you want, but perhaps is better 这并不能满足您的所有需求,但也许更好
library("dplyr")
library("jsonlite")
get_it <- function(x) {
jsonlite::fromJSON(as.character(x))
}
tbl_df(test) %>%
rowwise() %>%
mutate(one = get_it(var2)[[1]],
two = get_it(var2)[[2]])
Source: local data frame [5 x 4]
Groups: <by row>
var1 var2 one two
(dbl) (fctr) (chr) (int)
1 1 {"property1": "val1", "property2": 5} val1 5
2 2 {"property1": "val2", "property2": 8} val2 8
3 3 {"property1": "val3", "property2": 7} val3 7
4 4 {"property1": "val4", "property2": 0} val4 0
5 5 {"property1": "val5", "property3": 9} val5 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.