簡體   English   中英

使用`purrr`從列表中提取子列表結構到`data.frame`中

[英]Extracting sublist structures from lists into `data.frame` using `purrr`

這是類似主題的第三個問題(將列表的子集提取到data.frame )-我在繼續學習的同時data.frame理解也更多,但是當問題發生輕微變化時,仍然遇到障礙。

前兩個相關問題: 使用purr 不同長度的分層列表中提取數據到data.frame中,使用purrr 從列表中提取數據到它自己的data.frame中

這是類似口味的三分之一-

樣本數據(列表的代表性列表):

q <- list(structure(list(a = -1.54676469632688, b = "s", c = "T", 
d = structure(list(id = 5L, label = "Utah", link = "Asia/Anadyr", 
    score = -0.21104594634643), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "the normal flow of supply chain activities is interrupted,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "companies may experience financial loss, cost increases,", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "market share declines, customer defection and damage to", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")), structure(list(a = 7.74576236632992, b = "z", c = "F", d = structure(list(id = 3L, label = "South Carolina", link = "Pacific/Wallis", score = 2.44729194863711), .Names = c("id", "label", "link", "score")), sentiment = list(structure(list(text = structure(list(content = "impacted companies by seven percent, on average.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "today’s shortened product lifecycles, more demanding", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "into global markets, mean this approach is no longer", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.300000011920929, score = -0.300000011920929), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(
    text = structure(list(content = "and down rapidly as market conditions change.", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0, score = 0), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")), structure(list(text = structure(list(content = "flexible supply chain allows them to both reduce risk and", beginOffset = -1), .Names = c("content", "beginOffset")), sentiment = structure(list(magnitude = 0.5, score = 0.5), .Names = c("magnitude", "score"))), .Names = c("text", "sentiment")))), .Names = c("a", "b", "c", "d", "sentiment")))

由於JSON提取,我有很多列表。 我嘗試將各種感興趣的子列表提取到自己的表中( data.framedata.table

> q %>% map(names)
[[1]]
[1] "a"         "b"         "c"         "d"         "sentiment"
[[2]]
[1] "a"         "b"         "c"         "d"         "sentiment"

在這種情況下,我想要:
-每個元素的第五個元素( "sentiment" )( q[[1]][[5]]q[[2]][[5]]等)
-以及從第一個元素說的一些標識變量( "a""b" )( q[[1]][[1]]q[[1]][[2]]等)

第5個元素的長度各不相同,但始終> 1,而ID變量(即ab )的長度始終為1。

我從前兩個問題中學到,最好從最嵌套的元素開始,然后在必要時使用回收元素(例如,使用data.frame )“向外”進行處理,從而最好地實現這些類型的任務。 我遇到的問題是將第5個元素中的內容組織成所需的格式,這是我正在做的事情:

> DF <- q %>% 
        map(`[`, c("a", "b", "sentiment")) %>% 
        map(modify_at, "sentiment", bind_rows) %>% 
        map_df(data.frame, stringsAsFactors = F)

當我第一次使用"sentiment"子列表的bind_rows時,對於每個元素,我會得到兩行兩個變量的匯總,而不是四變量的一行:

head(DF, 2)
   a        b                                             sentiment.text sentiment.sentiment
1 -1.546765 s the normal flow of supply chain activities is interrupted,                 0.3 
2 -1.546765 s                                                         -1                -0.3

我了解這是由於"sentiment"的結構"sentiment" ,但是我不確定如何使"text""sentiment"對象分別具有兩個元素"content", "beginOffset""magnitude", "score" "content", "beginOffset" "magnitude", "score"

所需的輸出,而不是用head(DF, 2)顯示的輸出是:

a           b                                     sentiment.text.content sentiment.text.beginOffset sentiment.sentiment.magnitude sentiment.sentiment.score
1 -1.546765 s the normal flow of supply chain activities is interrupted,                         -1                           0.3                      -0.3

像這樣嗎?

DF <- q %>% 
  map(`[`, c("a", "b", "sentiment")) %>% 
  map(.%>% modify_at("sentiment",. %>% map(as.data.frame,stringsAsFactors=FALSE) %>%bind_rows)) %>% 
  map_df(data.frame, stringsAsFactors = F)

#           a b                                     sentiment.text.content sentiment.text.beginOffset sentiment.sentiment.magnitude sentiment.sentiment.score
# 1 -1.546765 s the normal flow of supply chain activities is interrupted,                         -1                           0.3                      -0.3
# 2 -1.546765 s   companies may experience financial loss, cost increases,                         -1                           0.0                       0.0
# 3 -1.546765 s    market share declines, customer defection and damage to                         -1                           0.3                      -0.3
# 4  7.745762 z           impacted companies by seven percent, on average.                         -1                           0.3                      -0.3
# 5  7.745762 z       today’s shortened product lifecycles, more demanding                         -1                           0.0                       0.0
# 6  7.745762 z       into global markets, mean this approach is no longer                         -1                           0.3                      -0.3
# 7  7.745762 z              and down rapidly as market conditions change.                         -1                           0.0                       0.0
# 8  7.745762 z  flexible supply chain allows them to both reduce risk and                         -1                           0.5                       0.5

str(DF)
# 'data.frame': 8 obs. of  6 variables:
# $ a                            : num  -1.55 -1.55 -1.55 7.75 7.75 ...
# $ b                            : chr  "s" "s" "s" "z" ...
# $ sentiment.text.content       : chr  "the normal flow of supply chain activities is interrupted," "companies may experience financial loss, cost increases," "market share declines, customer defection and damage to" "impacted companies by seven percent, on average." ...
# $ sentiment.text.beginOffset   : num  -1 -1 -1 -1 -1 -1 -1 -1
# $ sentiment.sentiment.magnitude: num  0.3 0 0.3 0.3 0 ...
# $ sentiment.sentiment.score    : num  -0.3 0 -0.3 -0.3 0 ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM