简体   繁体   English

将嵌套的JSON对象转换为R中的数据帧

[英]Convert Nested JSON Object into data Frame in R

I am fetching the data from Twitter API. 我正在从Twitter API获取数据。 Converting a Data from JSON object to Data Frame and load into Data Warehouse. 将数据从JSON对象转换为数据框架并加载到数据仓库中。 Find below input and code snippet. 在下面找到输入和代码段。

I am very new to R Programming. 我是R编程的新手。

 stats_campaign.data <- content(stats_campaign.request)
 print(stats_campaign.data)

O/P: O / P:

`{
 "data_type": [ "stats" ],
 "time_series_length": [ 1 ],
 "data": [
 {
  "id": [ "XXXXX" ],
  "id_data": [
    {
      "segment": {},
      "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
    }
   ]
   },

   {      
   "id": [ "XXXX1" ],
   "id_data": [
    {
      "segment": {},
      "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
    }
    ]
    },`

When I am reading this JSON value , 当我读取此JSON值时,

    stats_json_file <- sprintf("P:/R Repos/R    
               Applications/TwitterAPIData/stats_test_data-%s.json", TODAY)
    jsonlite::fromJSON(stats_json_file)

   **Result :**
       id                                      id_data
    1  5wcaz                                         NULL
    2  5ub2u                                         NULL
    3  5wb8x                                         NULL
    4  5wb1j                                         NULL
    5  5yqwj                                         NULL
    6  5pq5i                                         NULL
    7  5u197                                         NULL
    8  5z2js                                         NULL
    9  6fqh0   333250, 4, 9, 19, 111, 3189, 3156, 5, 1091
    10 5tvr1                                         NULL
    11 5yqw4                                         NULL
    12 5qqps                                         NULL
    13 5yqvw                                         NULL
    14 5ygom                                         NULL
    15 5nc88                                         NULL
    16 5yg94                                         NULL
    17 65t9e                                         NULL
    18 5peck                                         NULL
    19 63pg1 247283, 17, 22, 35, 297, 5514, 5450, 6, 2971
    20 6cdvy        156705, 1, 2, 6, 112, 10933, 605, 170

   From my JSON file I want Id and whole "metrics": {
        "impressions": {},
        "tweets_send": {},
        "qualified_impressions": {},
        "follows": {},
        "app_clicks": {},
        "retweets": {},
        "likes": {},
        "engagements": {},
        "clicks": {},
        "card_engagements": {},
        "replies": {},
        "url_clicks": {},
        "carousel_swipes": {}
      }
      and convert to Data Frame to load into Data Base. Plzz Help..!

How can I parsed this JSON Object. 如何解析此JSON对象。 I want to retrieve Id & whole Metrics object. 我想检索ID和整个Metrics对象。 Then want to convert into Data Frame to load into SQL Table. 然后要转换为数据框以加载到SQL表中。

To read the multiple Id's & Metrics value I used below code, 要读取我在下面的代码中使用的多个ID和指标值,

`test <- list()
 for(i in 1:len)
 { test <- unlist(stats_campaign.data$data[[i]])
 print(test)}`

 **Output:**
      id 
  "5wcaz" 
      id 
   "5ub2u" 
      id 
  "5wb8x" 
      id 
 "5wb1j" 
      id 
 "5yqwj" 
      id 
  "5pq5i" 
      id 
  "5u197" 
      id 
  "5z2js" 
      id 
  "5tvr1" 
      id 
  "5yqw4" 
      id 
  "5qqps" 
      id 
  "5yqvw" 
      id 
  "5ygom" 
      id 
  "5nc88" 
      id 
  "5yg94" 
      id 
  "65t9e" 
      id 
  "5peck" 
                     id id_data.metrics.impressions 
                   "63pg1"                    "133227" 
                      id_data.metrics.tweets_send     id_data.metrics.follows 
                   "10"                         "9" 
                      id_data.metrics.retweets       id_data.metrics.likes 
                   "17"                        "96" 
                    id_data.metrics.engagements      id_data.metrics.clicks 
                 "2165"                      "2134" 
                    id_data.metrics.replies  id_data.metrics.url_clicks 
                    "5"                      "1204" 
                     id id_data.metrics.impressions 
                "6cdvy"                    "176164" 
     id_data.metrics.tweets_send    id_data.metrics.retweets 
                    "2"                        "10" 
    id_data.metrics.likes id_data.metrics.engagements 
                  "121"                      "9708" 
    id_data.metrics.clicks  id_data.metrics.url_clicks 
                  "620"                       "160"

Within a for I have to used list or something else to append the value each time, how can I do that ..?? 在for中,我每次都必须使用列表或其他方式附加值,我该怎么做.. ?? Am I using a right Approach.?? 我在使用正确的方法吗? Is there any alternative way I can parsed nested JSON object and directly put into Data Frame..? 有什么其他方法可以解析嵌套的JSON对象并直接放入Data Frame ..?

Please Help..! 请帮忙..! Thanks In Advance..! 提前致谢..!

As mentioned in the comments, a bit more information about what output you are looking for would be helpful. 如评论中所述,有关您要查找的输出的更多信息会有所帮助。 In any case, I am hopeful that the following will provide a helpful direction. 无论如何,我希望以下内容能提供有益的指导。 The tidyjson README provides a bit of helpful overview. tidyjson自述文件提供了一些有用的概述。

Unfortunately, the lack of data in your JSON object makes it difficult to illustrate what might be present in your data (what to expect in the null objects), and I am having difficulty determining what part of the Twitter API you are looking at. 不幸的是,由于JSON对象中缺少数据,因此很难说明数据中可能存在的内容(空对象中有什么期望),而且我很难确定要查看的Twitter API的哪一部分。 tidyjson gives you the ability to produce a consistent data.frame output, even when you have no data, though! tidyjson使您能够生成一致的data.frame输出,即使没有数据也可以! The key verbs are gather and spread , much like tidyr , but with JSON flavor. 关键动词是gatherspread ,很像tidyr ,但具有JSON风格。

str <- "{\"data_type\":[\"stats\"],\"time_series_length\":[1],\"data\":[{\"id\":[\"XXXXX\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]},{\"id\":[\"XXXX1\"],\"id_data\":[{\"segment\":{},\"metrics\":{\"impressions\":{},\"tweets_send\":{},\"qualified_impressions\":{},\"follows\":{},\"app_clicks\":{},\"retweets\":{},\"likes\":{},\"engagements\":{},\"clicks\":{},\"card_engagements\":{},\"replies\":{},\"url_clicks\":{},\"carousel_swipes\":{}}}]}]} "

library(dplyr)
library(tidyjson)

prep <- as.tbl_json(str) %>% enter_object("data") %>% gather_array("objid")

p1 <- prep %>% enter_object("id") %>% 
  gather_array("idnum") %>% append_values_string("id")

p2 <- prep %>% enter_object("id_data") %>% gather_array("datanum") %>%
enter_object("metrics") %>% 
spread_values(
 impressions = jstring("impressions", "value")
 , tweets_send = jnumber("tweets_send", "somekey")
)

p1 %>% tbl_df() %>% left_join(p2 %>% tbl_df(), by = c("document.id", "objid"))
#> # A tibble: 2 x 7
#>   document.id objid idnum    id datanum impressions tweets_send
#>         <int> <int> <int> <chr>   <int>       <chr>       <dbl>
#> 1           1     1     1 XXXXX       1        <NA>          NA
#> 2           1     2     1 XXXX1       1        <NA>          NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM