dplyr使用purrr :: map在小標題列表中計數單個觀察值

Question

我試圖計算包含單個觀察結果的小標題列表中出現的頻率，這些觀察值之間用“;”分隔。 我遇到了一個錯誤，當我使用purrr::map()內purrr::map() 我懷疑我缺少一些簡單的東西，因此不勝感激。

以從不同客戶處購買的水果為例，其中同時購買的水果之間用“;”分隔。

# Fruit purchases across days with different number of customers.
day_1 <- as_data_frame(setNames(list(c("oranges;peaches;apples", "pears;apples", "bananas", "oranges;apples", "apples")), "fruits"))
day_2 <- as_data_frame(setNames(list(c("oranges;apples", "peaches","apples;bananas;", "pears", "apples;peaches", "oranges")), "fruits"))
day_3 <- as_data_frame(setNames(list(c("peaches;pears","apples","bananas")), "fruits"))

# Create list of fruit purchases.
fruit_list <- list(day_1, day_2, day_3)

這將返回三個小標題的列表，這是我的數據的一般格式。 我可以使用dplyr / purrr來計算每天每種水果的總觀察purrr ：

fruit_list %>% 
  map(function(x) strsplit(x$fruits, ";")) %>% 
  map(unlist) %>% 
  map(table)

但是，當我嘗試使用map()中的map()隔離和匯總所有小點心列表中的單個水果時，我遇到了錯誤

“錯誤： .x不是向量（關閉）”

fruit_list %>% 
  map(mutate(fruit_count = map(function(x) strsplit(x$fruits, ";"), length))) %>% 
  filter(fruit_count==1) %>% 
  count(solo_fruits = fruits)

我可以在單個小標題/ df上執行此功能，但不能在小標題列表中進行。 我是否缺少map()函數或更明顯的東西？ 謝謝！

第一個小節所需的結果格式：

# A tibble: 2 x 2
  solo_fruits     n
  <chr>       <int>
1 apples          1
2 bananas         1

我如何得出單個樣本的上述答案：

day_1_df <- as.data.frame(fruit_list[[1]]) 
day_1_df %>% 
  mutate(fruit_count = map(strsplit(day_1_df$fruits, ";"), length)) %>% 
  filter(fruit_count==1) %>% 
  count(solo_fruits = fruits)

Answer 1

並非完全符合您的要求，但這可能會以其他方式解決您的問題：

library(tidyverse)

day_1 <- as_data_frame(setNames(list(c("oranges;peaches;apples", "pears;apples", "bananas", "oranges;apples", "apples")), "fruits"))
day_2 <- as_data_frame(setNames(list(c("oranges;apples", "peaches","apples;bananas;", "pears", "apples;peaches", "oranges")), "fruits"))
day_3 <- as_data_frame(setNames(list(c("peaches;pears","apples","bananas")), "fruits"))

df <- tibble(day = 1:3, fruits = c(day_1, day_2, day_3)) %>% 
  unnest() %>% 
  mutate(fruits = strsplit(fruits, ";"), customer = row_number()) %>% 
  unnest()

df %>% 
  group_by(customer) %>% 
  filter(n() == 1) %>% 
  group_by(customer, day, fruits) %>% 
  summarise(n = n())

# # A tibble: 7 x 4
# # Groups:   customer, day [?]
#   customer   day fruits      n
#      <int> <int> <chr>   <int>
# 1        3     1 bananas     1
# 2        5     1 apples      1
# 3        7     2 peaches     1
# 4        9     2 pears       1
# 5       11     2 oranges     1
# 6       13     3 apples      1
# 7       14     3 bananas     1

編輯：誤解后更改

Answer 2

您可以僅使用str_detect捕獲沒有行; 。 或者您可以使用str_count進行計數; 然后加1。

fruit_list%>%
     map(~filter(.x,!str_detect(fruits,";"))%>%
             mutate(solo_fruits = fruits,count = 1,fruits=NULL))
[[1]]
# A tibble: 2 x 2
  solo_fruits count
  <chr>       <dbl>
1 bananas         1
2 apples          1

[[2]]
# A tibble: 3 x 2
  solo_fruits count
  <chr>       <dbl>
1 peaches         1
2 pears           1
3 oranges         1

[[3]]
# A tibble: 2 x 2
  solo_fruits count
  <chr>       <dbl>
1 apples          1
2 bananas         1

我使用str_count意思是：它將為您提供每行水果的總數。 而不是分裂，然后使用長度

fruit_list%>%
    map(~mutate(.x,count = str_count(fruits,";") + 1))

dplyr使用purrr :: map在小標題列表中計數單個觀察值

問題描述

第一個小節所需的結果格式：

2 個解決方案

解決方案1
0 已采納 2018-08-09 22:53:06

解決方案2
0 2018-08-09 23:10:24

dplyr使用purrr :: map在小標題列表中計數單個觀察值

問題描述

第一個小節所需的結果格式：

2 個解決方案

解決方案1 0 已采納 2018-08-09 22:53:06

解決方案2 0 2018-08-09 23:10:24

解決方案1
0 已采納 2018-08-09 22:53:06

解決方案2
0 2018-08-09 23:10:24