简体   繁体   English

如何从JSON Blob中提取值到R数据框中的新列?

[英]How to extract values from JSON blob into new columns in R data frame?

I am stuck on the following problem: 我陷入以下问题:

I have a data frame that has a variable that contains JSON objects (in var2 ): 我有一个数据框,该框的变量包含JSON对象(在var2 ):

  var1                                  var2
1    1 {"property1": "val1", "property2": 5}
2    2 {"property1": "val2", "property2": 8}
3    3 {"property1": "val3", "property2": 7}
4    4 {"property1": "val4", "property2": 0}
5    5 {"property1": "val5", "property3": 9}

(Code on pastebin here ) 这里的 pastebin代码)

I want to extract the JSON properties in var2 and andd them to the data frame in new columns like so: 我想提取var2的JSON属性,并将它们添加到新列的数据框中,如下所示:

  var1                                  var2 prop1 prop2 prop3
1    1 {"property1": "val1", "property2": 5}  val1     5    NA
2    2 {"property1": "val2", "property2": 8}  val2     8    NA
3    3 {"property1": "val3", "property2": 7}  val3     7    NA
4    4 {"property1": "val4", "property2": 0}  val4     0    NA
5    5 {"property1": "val5", "property2": 9}  val5    NA     9

Given identical properties in identical sequence, I have found this way to make it work: 给定相同序列中的相同属性,我发现了这种方法可以使其工作:

jsonProps <- sapply(df$var2, function(x) fromJSON(x)) %>%
  t() %>%
  as.data.frame()
rownames(jsonProps) <- NULL

y <- cbind(df, jsonProps)

(I am happy to receive any suggestions on how to make this more efficient, if there might be any.) (很高兴收到关于如何提高效率的任何建议,如果有的话。)

This does not work anymore when 当这不再起作用

  • the number of properties differs across records and/or 记录和/或属性的数量不同
  • the sequence changes and/or 顺序更改和/或
  • different properties are stored between records. 记录之间存储了不同的属性。

I am at a loss on how to dynmically create columns from the properties I find and transfer the property values correctly and would thus welcome your suggestions on how to tackle this. 我不知道如何动态地从找到的属性中创建列并正确传输属性值,因此欢迎您提出有关如何解决此问题的建议。

You can do: 你可以做:

library(plyr)
library(jsonlite)

ll = lapply(df$var2, function(x) jsonlite::fromJSON(as.character(x)))
cbind(df, ldply(ll, data.frame))

#  var1                                  var2 property1 property3 property2
#1    a {"property1": "val1", "property3": 8}      val1         8        NA
#2    a {"property1": "val1", "property2": 5}      val1        NA         5

Data: 数据:

df = structure(list(var11 = structure(c(1L, 1L), .Label = "a", class = "factor"), 
var2 = structure(1:2, .Label = c("{\"property1\": \"val1\", \"property3\": 8}", 
"{\"property1\": \"val1\", \"property2\": 5}"), class = "factor")), .Names = c("var1", 
"var2"), class = "data.frame", row.names = 1:2)

This doesn't do everything you want, but perhaps is better 这并不能满足您的所有需求,但也许更好

library("dplyr")
library("jsonlite")

get_it <- function(x) {
  jsonlite::fromJSON(as.character(x))
}

tbl_df(test) %>%
  rowwise() %>%
  mutate(one = get_it(var2)[[1]],
         two = get_it(var2)[[2]])

Source: local data frame [5 x 4]
Groups: <by row>

   var1                                  var2   one   two
  (dbl)                                (fctr) (chr) (int)
1     1 {"property1": "val1", "property2": 5}  val1     5
2     2 {"property1": "val2", "property2": 8}  val2     8
3     3 {"property1": "val3", "property2": 7}  val3     7
4     4 {"property1": "val4", "property2": 0}  val4     0
5     5 {"property1": "val5", "property3": 9}  val5     9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM