簡體   English   中英

讀取帶有嵌套列表的 JSON 文件 R

[英]Read JSON file with nested lists in R

我有一個很大的 json 數據集,我想將它轉換為 R 中的數據框

(對不起,如果它可能是一個重復的問題,但其他答案對我沒有幫助)我的 Json 文件如下:

[{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.899-1", "orig_lang": "en", "new": ["- having regard to its resolution of 13", "December 20071 on Justice for the", "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", "wartime victims, including victims of the", "'comfort women' system,", "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 6 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.863v01-00", "reference": "2014/2016(INI)", "date": "2014-02-27T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.863-1", "orig_lang": "en", "new": ["- having regard to the Statement by the", "Vice-President of the Commission/ High", "Representative of the Union for Foreign", "affairs and Security Policy (VP/HR)", "Catherine Ashton of 20 March 2013 on", "the Magnitsky case in the Russian", "Federation,"], "authors": "Jacek Protasiewicz", "meps": [23782], "location": [["Motion for a resolution", "Citation 4 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.713v01-00", "reference": "2013/2149(INI)", "date": "2014-02-12T00:00:00", "committee": ["AFET"], "seq": 238, "id": "PE529.713-238", "orig_lang": "en", "old": ["A. whereas the European Neighbourhood", "Policy (ENP), in particular the Eastern", "Partnership (EaP), aims to extend the", "values and ideas of the founders of the EU;"], "new": ["A. whereas the European Neighbourhood", "Policy (ENP) embraces the values and", "ideas of the founders of the EU, notably", "the principles of Peace, Solidarity and", "Prosperity;"], "authors": "Mário David", "meps": [96973], "location": [["Motion for a resolution", "Recital A"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 2, "id": "PE529.899-2", "orig_lang": "en", "new": ["- having regard to the catastrophic", "earthquake and subsequent tsunami", "which devastated important parts of", "Japan's coast on 11 March 2011 and led", "to the destruction of the Fukushima", "nuclear power plant, causing possibly the", "greatest radiation disaster in human", "history,"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 11 a (new)"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}

我想要一個 dataframe 如下:

         src               peid          reference                date           committee        seq        id        orig_lang             new                  ...  
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ... p.531"]          ...
http://www.europarl.eu PE529.863v01-00  2014/2016(INI)    2014-02-27T00:00:00       AFET          128     PE529.899-1       en      ["- having ..."Federation,"]  ...
http://www.europarl.eu PE529.713v01-00  2013/2149(INI)    2014-02-12T00:00:00       AFET          238     PE529.899-1       en      ["- having ..."Federation,"]    ...
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ..."Federation,"]    ...

(我沒有寫上面的完整表格)

我已經嘗試過以下代碼:

library(rjson)
library(jsonlite)
Data <- fromJSON(file="data.json")

但每一行如下所示:

[[1]]
[[1]]$src
[1] "http://www.europarl.eu/sides/getDoc.do?pubRef=-//EP//NONSGML+COMPARL+PE-529.899+01+DOC+PDF+V0//EN&language=EN"

[[1]]$peid
[1] "PE529.899v01-00"

[[1]]$reference
[1] "2014/2021(INI)"

[[1]]$date
[1] "2014-03-05T00:00:00"

[[1]]$committee
[1] "AFET"

[[1]]$seq
[1] 1

[[1]]$id
[1] "PE529.899-1"

[[1]]$orig_lang
[1] "en"

[[1]]$new
[1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
[3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
[5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
[7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
[9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"             

[[1]]$authors
[1] "Reinhard Bütikofer on behalf of the Verts/ALE Group"

[[1]]$meps
[1] 96739

[[1]]$location
[[1]]$location[[1]]
[1] "Motion for a resolution" "Citation 6 a (new)"     


[[1]]$meta
[[1]]$meta$created
[1] "2019-07-03T05:06:17"


[[1]]$changes
list()

dput 版本如下:

list(list(src = "http://www.europarl.eu", 
    peid = "PE529.899v01-00", reference = "2014/2021(INI)", date = "2014-03-05T00:00:00", 
    committee = "AFET", seq = 1, id = "PE529.899-1", orig_lang = "en", 
    new = c("- having regard to its resolution of 13", "December 20071 on Justice for the", 
    "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", 
    "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", 
    "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", 
    "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", 
    "wartime victims, including victims of the", "'comfort women' system,", 
    "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"
    ), authors = "Reinhard Bütikofer on behalf of the Verts/ALE Group", 
    meps = 96739, location = list(c("Motion for a resolution", 
    "Citation 6 a (new)")), meta = list(created = "2019-07-03T05:06:17"), 
    changes = list()))

我遇到的問題之一是第 9 列,如下所示,我想將所有 15 個組件放在 dataframe 的一個單元格中

[[1]]$new
 [1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
 [3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
 [5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
 [7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
 [9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"

我怎樣才能得到我上面提到的表格?

我們可以通過paste ing ( str_c ) 將lengths大於 1 的嵌套list元素轉換為單個字符串,然后使用_dfr將命名列表綁定到列

library(purrr)
library(dplyr)
library(stringr)
map_dfr(Data, ~ map(.x, unlist) %>%
     map_dfr(~ if(length(.x) > 1) str_c(.x, collapse = ";") else .x))

或者使用遞歸 function rrapplylength大於 1 的元素bindlist

library(rrapply)
map_dfr(Data, ~ rrapply(.x, how = "bind"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM