使用`purrr`从列表中提取子列表结构到`data.frame`中

Question

这是类似主题的第三个问题（将列表的子集提取到data.frame ）-我在继续学习的同时data.frame理解也更多，但是当问题发生轻微变化时，仍然遇到障碍。

前两个相关问题：使用purr 从不同长度的分层列表中提取数据到data.frame中，使用purrr 从列表中提取数据到它自己的data.frame中

这是类似口味的三分之一-

样本数据（列表的代表性列表）：

q <- list(structure(list(a = -1.54676469632688, b = "s", c = "T", 
d = structure(list(id = 5L, label = "Utah", link = "Asia/Anadyr", 
    score = -0.21104594634643), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "the normal flow of supply chain activities is interrupted,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "companies may experience financial loss, cost increases,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "market share declines, customer defection and damage to", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")), structure(list(a = 7.74576236632992, b = "z", c = "F", d = structure(list(id = 3L, label = "South Carolina", link = "Pacific/Wallis", score = 2.44729194863711), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "impacted companies by seven percent, on average.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "today’s shortened product lifecycles, more demanding", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "into global markets, mean this approach is no longer", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(
    text = structure(list(content = "and down rapidly as market conditions change.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "flexible supply chain allows them to both reduce risk and", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.5, score = 0.5), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")))

由于JSON提取，我有很多列表。 我尝试将各种感兴趣的子列表提取到自己的表中（ data.frame或data.table ）

> q %>% map(names)
[[1]]
[1] "a"         "b"         "c"         "d"         "sentiment"
[[2]]
[1] "a"         "b"         "c"         "d"         "sentiment"

在这种情况下，我想要：
-每个元素的第五个元素（ "sentiment" ）（ q[[1]][[5]] ， q[[2]][[5]]等）
-以及从第一个元素说的一些标识变量（ "a" ， "b" ）（ q[[1]][[1]] ， q[[1]][[2]]等）

第5个元素的长度各不相同，但始终> 1，而ID变量（即a ， b ）的长度始终为1。

我从前两个问题中学到，最好从最嵌套的元素开始，然后在必要时使用回收元素（例如，使用data.frame ）“向外”进行处理，从而最好地实现这些类型的任务。 我遇到的问题是将第5个元素中的内容组织成所需的格式，这是我正在做的事情：

> DF <- q %>% 
        map(`[`, c("a", "b", "sentiment")) %>% 
        map(modify_at, "sentiment", bind_rows) %>% 
        map_df(data.frame, stringsAsFactors = F)

当我第一次使用"sentiment"子列表的bind_rows时，对于每个元素，我会得到两行两个变量的汇总，而不是四变量的一行：

head(DF, 2)
   a        b                                             sentiment.text sentiment.sentiment
1 -1.546765 s the normal flow of supply chain activities is interrupted,                 0.3 
2 -1.546765 s                                                         -1                -0.3

我了解这是由于"sentiment"的结构"sentiment" ，但是我不确定如何使"text"和"sentiment"对象分别具有两个元素"content", "beginOffset"和"magnitude", "score" "content", "beginOffset" "magnitude", "score" 。

所需的输出，而不是用head(DF, 2)显示的输出是：

a           b                                     sentiment.text.content sentiment.text.beginOffset sentiment.sentiment.magnitude sentiment.sentiment.score
1 -1.546765 s the normal flow of supply chain activities is interrupted,                         -1                           0.3                      -0.3

Answer 1

像这样吗？

DF <- q %>% 
  map(`[`, c("a", "b", "sentiment")) %>% 
  map(.%>% modify_at("sentiment",. %>% map(as.data.frame,stringsAsFactors=FALSE) %>%bind_rows)) %>% 
  map_df(data.frame, stringsAsFactors = F)

#           a b                                     sentiment.text.content sentiment.text.beginOffset sentiment.sentiment.magnitude sentiment.sentiment.score
# 1 -1.546765 s the normal flow of supply chain activities is interrupted,                         -1                           0.3                      -0.3
# 2 -1.546765 s   companies may experience financial loss, cost increases,                         -1                           0.0                       0.0
# 3 -1.546765 s    market share declines, customer defection and damage to                         -1                           0.3                      -0.3
# 4  7.745762 z           impacted companies by seven percent, on average.                         -1                           0.3                      -0.3
# 5  7.745762 z       today’s shortened product lifecycles, more demanding                         -1                           0.0                       0.0
# 6  7.745762 z       into global markets, mean this approach is no longer                         -1                           0.3                      -0.3
# 7  7.745762 z              and down rapidly as market conditions change.                         -1                           0.0                       0.0
# 8  7.745762 z  flexible supply chain allows them to both reduce risk and                         -1                           0.5                       0.5

str(DF)
# 'data.frame': 8 obs. of  6 variables:
# $ a                            : num  -1.55 -1.55 -1.55 7.75 7.75 ...
# $ b                            : chr  "s" "s" "s" "z" ...
# $ sentiment.text.content       : chr  "the normal flow of supply chain activities is interrupted," "companies may experience financial loss, cost increases," "market share declines, customer defection and damage to" "impacted companies by seven percent, on average." ...
# $ sentiment.text.beginOffset   : num  -1 -1 -1 -1 -1 -1 -1 -1
# $ sentiment.sentiment.magnitude: num  0.3 0 0.3 0.3 0 ...
# $ sentiment.sentiment.score    : num  -0.3 0 -0.3 -0.3 0 ...

使用`purrr`从列表中提取子列表结构到`data.frame`中

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-10-26 17:26:53

使用`purrr`从列表中提取子列表结构到`data.frame`中

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-10-26 17:26:53

解决方案1
0 已采纳 2017-10-26 17:26:53