简体   繁体   English

将JSON转换为R数据框

[英]Convert JSON to R dataframe

I have a very messy JSON file (lists inside of lists) that I'm trying to convert to an R dataframe (part of the reason to convert the file is that I need to export it into a .csv file). 我有一个非常混乱的JSON文件(列表中的列表),我试图将其转换为R数据帧(转换文件的部分原因是我需要将其导出到.csv文件)。 Here is a sample of the data ( https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=0 ). 这是数据示例( https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=0 )。 I tried this solution ( Parse nested JSON to Data Frame in R ), but that got rid of many of my columns. 我尝试了这种解决方案(将嵌套的JSON解析到R中的Data Frame中 ),但这摆脱了我的许多专栏。 Below is the code I have so far: 下面是我到目前为止的代码:

    library("twitteR")
    library ("streamR")
    library("rjson")
    json_file <- "20140909-20141010_10zdfxhqf0_2014_09_09_01_00_activities.json"
    json_data <- fromJSON(file=json_file) #convert to r list 
    str (json_data) #list of 16 objects 

    #unlist elements
    tweets.i <- lapply(json_data, function(x){ unlist(x)})
    tweets <- do.call("rbind", tweets.i)
    tweets <- as.data.frame(tweets)

    library(plyr)
    tweets <- rbind.fill(lapply(tweets.i, 
                           function(x) do.call("data.frame", as.list(x))
    ))

Anyone have a way to convert the file to an R dataframe without losing all the info? 任何人都可以在不丢失所有信息的情况下将文件转换为R数据帧吗? I'm open to using Python to do this work to, I just don't have the expertise to figure out how to code it. 我愿意使用Python来完成这项工作,但我只是没有专门知识来弄清楚如何编写代码。

This is not very efficient, but it may work for you: 这不是很有效,但是可能对您有用:

download.file("https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1", destfile = tf <- tempfile(fileext = ".json"))
txt <- readLines(tf)
library(jsonlite)
library(plyr)
df <- do.call(plyr::rbind.fill, lapply(txt[txt != ""], function(x) as.data.frame(t(unlist(fromJSON(x))))))

I like the answer provided above as a really quick way to get everything. 我喜欢上面提供的答案,这是获取所有内容的一种非常快捷的方法。 You could try tidyjson , but it also will not be efficient since it requires pre-knowledge of the structure. 您可以尝试tidyjson ,但是由于它需要预先了解结构,因此效率也不高。 listviewer::jsonedit might help visualize what you are working with. listviewer::jsonedit可能有助于可视化您正在使用的内容。

#devtools::install_github("timelyportfolio/listviewer")
library(listviewer)
jsonedit(readLines(
   "https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
 )[2])

Perhaps a data.frame really isn't the best structure, but it really depends on what you are trying to accomplish. 也许data.frame确实不是最佳结构,但实际上取决于您要完成的工作。

This is just a sample to hopefully show you how it might look. 这只是一个示例,希望可以向您展示它的外观。

library(tidyjson)
library(dplyr)

json <- readLines(
  "https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
)

json %>%
  {
    Filter(
      function(x){return (nchar(x) != 0)}
      ,.
    )
  } %>%  
  as.tbl_json() %>%
  spread_values(
    id = jstring("id")
    ,objectType = jstring("objectType")
    ,link = jstring("link")
    ,body = jstring("body")
    ,favoritesCount = jstring("favoritesCount")
    ,twitter_filter_level = jstring("twitter_filter_level")
    ,twitter_lang = jstring("twitter_lang")
    ,retweetCount = jnumber("retweetCount")
    ,verb = jstring("verb")
    ,postedTime = jstring("postedTime")    
    # from actor object in the JSON
    ,actor_objectType = jstring("actor","objectType")
    ,actor_id = jstring("actor","id")
    ,actor_link = jstring("actor","link")
    ,actor_displayName = jstring("actor","displayName")
    ,actor_image = jstring("actor","image")
    ,actor_summary = jstring("actor","summary")
    ,actor_friendsCount = jnumber("actor","friendsCount")
    ,actor_followersCount = jnumber("actor","followersCount")
  ) %>%
  # careful once you enter you can't go back up
  enter_object("actor","links") %>%
  gather_array( ) %>%
  spread_values(
    actor_href = jstring("href")
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM