简体   繁体   English

读取带有嵌套列表的 JSON 文件 R

[英]Read JSON file with nested lists in R

I have a large json dataset and I would like to convert it to a data frame in R我有一个很大的 json 数据集,我想将它转换为 R 中的数据框

(Sorry if it may be a duplicated question but other answers didn't help me) My Json file is as follows: (对不起,如果它可能是一个重复的问题,但其他答案对我没有帮助)我的 Json 文件如下:

[{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.899-1", "orig_lang": "en", "new": ["- having regard to its resolution of 13", "December 20071 on Justice for the", "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", "wartime victims, including victims of the", "'comfort women' system,", "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 6 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.863v01-00", "reference": "2014/2016(INI)", "date": "2014-02-27T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.863-1", "orig_lang": "en", "new": ["- having regard to the Statement by the", "Vice-President of the Commission/ High", "Representative of the Union for Foreign", "affairs and Security Policy (VP/HR)", "Catherine Ashton of 20 March 2013 on", "the Magnitsky case in the Russian", "Federation,"], "authors": "Jacek Protasiewicz", "meps": [23782], "location": [["Motion for a resolution", "Citation 4 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.713v01-00", "reference": "2013/2149(INI)", "date": "2014-02-12T00:00:00", "committee": ["AFET"], "seq": 238, "id": "PE529.713-238", "orig_lang": "en", "old": ["A. whereas the European Neighbourhood", "Policy (ENP), in particular the Eastern", "Partnership (EaP), aims to extend the", "values and ideas of the founders of the EU;"], "new": ["A. whereas the European Neighbourhood", "Policy (ENP) embraces the values and", "ideas of the founders of the EU, notably", "the principles of Peace, Solidarity and", "Prosperity;"], "authors": "Mário David", "meps": [96973], "location": [["Motion for a resolution", "Recital A"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 2, "id": "PE529.899-2", "orig_lang": "en", "new": ["- having regard to the catastrophic", "earthquake and subsequent tsunami", "which devastated important parts of", "Japan's coast on 11 March 2011 and led", "to the destruction of the Fukushima", "nuclear power plant, causing possibly the", "greatest radiation disaster in human", "history,"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 11 a (new)"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}

I would like to have a dataframe as follows:我想要一个 dataframe 如下:

         src               peid          reference                date           committee        seq        id        orig_lang             new                  ...  
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ... p.531"]          ...
http://www.europarl.eu PE529.863v01-00  2014/2016(INI)    2014-02-27T00:00:00       AFET          128     PE529.899-1       en      ["- having ..."Federation,"]  ...
http://www.europarl.eu PE529.713v01-00  2013/2149(INI)    2014-02-12T00:00:00       AFET          238     PE529.899-1       en      ["- having ..."Federation,"]    ...
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ..."Federation,"]    ...

(I didn't write the complete table above) (我没有写上面的完整表格)

I have already tried the following codes:我已经尝试过以下代码:

library(rjson)
library(jsonlite)
Data <- fromJSON(file="data.json")

but each row is shown as below:但每一行如下所示:

[[1]]
[[1]]$src
[1] "http://www.europarl.eu/sides/getDoc.do?pubRef=-//EP//NONSGML+COMPARL+PE-529.899+01+DOC+PDF+V0//EN&language=EN"

[[1]]$peid
[1] "PE529.899v01-00"

[[1]]$reference
[1] "2014/2021(INI)"

[[1]]$date
[1] "2014-03-05T00:00:00"

[[1]]$committee
[1] "AFET"

[[1]]$seq
[1] 1

[[1]]$id
[1] "PE529.899-1"

[[1]]$orig_lang
[1] "en"

[[1]]$new
[1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
[3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
[5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
[7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
[9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"             

[[1]]$authors
[1] "Reinhard Bütikofer on behalf of the Verts/ALE Group"

[[1]]$meps
[1] 96739

[[1]]$location
[[1]]$location[[1]]
[1] "Motion for a resolution" "Citation 6 a (new)"     


[[1]]$meta
[[1]]$meta$created
[1] "2019-07-03T05:06:17"


[[1]]$changes
list()

dput version is below: dput 版本如下:

list(list(src = "http://www.europarl.eu", 
    peid = "PE529.899v01-00", reference = "2014/2021(INI)", date = "2014-03-05T00:00:00", 
    committee = "AFET", seq = 1, id = "PE529.899-1", orig_lang = "en", 
    new = c("- having regard to its resolution of 13", "December 20071 on Justice for the", 
    "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", 
    "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", 
    "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", 
    "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", 
    "wartime victims, including victims of the", "'comfort women' system,", 
    "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"
    ), authors = "Reinhard Bütikofer on behalf of the Verts/ALE Group", 
    meps = 96739, location = list(c("Motion for a resolution", 
    "Citation 6 a (new)")), meta = list(created = "2019-07-03T05:06:17"), 
    changes = list()))

One of the problems that I have is in column 9 as you can see below, I want to put all the 15 components in one cell of the dataframe我遇到的问题之一是第 9 列,如下所示,我想将所有 15 个组件放在 dataframe 的一个单元格中

[[1]]$new
 [1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
 [3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
 [5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
 [7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
 [9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"

How can I get the table I mentioned above?我怎样才能得到我上面提到的表格?

We may either convert the nested list elements with lengths greater than 1 to a single string by paste ing ( str_c ) and then bind the named list to columns with _dfr我们可以通过paste ing ( str_c ) 将lengths大于 1 的嵌套list元素转换为单个字符串,然后使用_dfr将命名列表绑定到列

library(purrr)
library(dplyr)
library(stringr)
map_dfr(Data, ~ map(.x, unlist) %>%
     map_dfr(~ if(length(.x) > 1) str_c(.x, collapse = ";") else .x))

Or use a recursive function rrapply to bind the elements having length greater than 1 as list column或者使用递归 function rrapplylength大于 1 的元素bindlist

library(rrapply)
map_dfr(Data, ~ rrapply(.x, how = "bind"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM