简体   繁体   English

使用purrr :: map2遍历dplyr代码

[英]Iterate over dplyr code using purrr::map2

I am relatively new to R, so my apologies if this question is too basic. 我对R还是比较陌生,所以如果这个问题太基础,我深表歉意。

I have transactions that show quantity sold and revenue earned from different products. 我有显示不同数量产品的销售数量和收入的交易。 Because there are three products, there are 2^3 = 8 combinations for selling these products in a "basket." 由于存在三种产品,因此有2^3 = 8组合可以在“购物篮”中出售这些产品。 Each basket could be sold in any of the three given years (2016, 2017, 2018) and in any of the zones (East and West). 每个篮子都可以在给定的三年(2016年,2017年,2018年)和任何区域(东西方)出售。 [I have 3 years worth of transactions for the two zones: East and West.] [我在东部和西部两个区域进行了3年的交易。]

My objective is to analyze how much revenue is earned, how many quantities are sold, and how many transactions occurred for each combination of these products in a given year for a given zone. 我的目标是分析给定区域中在给定年份中这些产品的每种组合所赚取的收入,多少销售量以及发生了多少交易。

I was able to do the above operation (using purrr::map ) by splitting the data based on zones. 通过基于区域划分数据,我能够执行上述操作(使用purrr::map )。 I have created a list of two data frames that hold data grouped by "year" for each combination described above. 我创建了两个数据框的列表,其中包含上述每种组合按“年”分组的数据。 This works well. 这很好。 However, the code is a little clunky in my opinion. 但是,我认为代码有点笨拙。 There are a lot of repetitive statements. 有很多重复的陈述。 I want to be able to create a list of 2X3 (ie 2 zones and 3 years) 我希望能够创建2X3的列表(即2个区域和3年)

Here's my code using zone-wise splitting. 这是我使用区域分割的代码。

First Try 第一次尝试

UZone <- unique(Input_File$Zone)
FYear <- unique(Input_File$Fiscal.Year)

  #Split based on zone
  a<-purrr::map(UZone, ~ dplyr::filter(Input_File, Zone == .)) %>%

  #Create combinations of products
  purrr::map(~mutate_each(.,funs(Exists = . > 0), L.Rev:I.Qty )) %>% 

  #group by Fiscal Year
  purrr::map(~group_by_(.,.dots = c("Fiscal.Year", grep("Exists", names(.), value = TRUE)))) %>% 

  #Summarize, delete unwanted columns and rename the "number of transactions" column
  purrr::map(~summarise_each(., funs(sum(., na.rm = TRUE), count = n()), L.Rev:I.Qty)) %>%
    purrr::map(~select(., Fiscal.Year:L.Rev_count)) %>%
    purrr::map(~plyr::rename(.,c("L.Rev_count" = "No.Trans")))

  #Now do Zone and Year-wise splitting : Try 1
  EastList<-a[[1]]
  EastList <- EastList %>% split(.$Fiscal.Year) 

  WestList<-a[[2]]
  WestList <- WestList %>% split(.$Fiscal.Year) 
  write.xlsx(EastList , file = "East.xlsx",row.names = FALSE)
  write.xlsx(WestList , file = "West.xlsx",row.names = FALSE)      

As you can see, the above code is very clunky. 如您所见,以上代码非常笨拙。 With limited knowledge of R, I researched https://blog.rstudio.org/2016/01/06/purrr-0-2-0/ and read purrr::map2() manual but I couldn't find too many examples. purrr::map2() R知识的情况下,我研究了https://blog.rstudio.org/2016/01/06/purrr-0-2-0/并阅读了purrr::map2()手册,但找不到太多示例。 After reading the solution at How to add list of vector to list of data.frame objects as new slot by parallel? 在阅读了如何将向量列表并行添加到data.frame对象列表作为新插槽的解决方案之后 , I am assuming that I could use X = zone and Y= Fiscal Year to do what I have done above. ,我假设我可以使用X =区域和Y =会计年度来完成上述操作。

Here's what I tried: Second Try 这是我尝试过的方法: 第二次尝试

  #Now try Zone and Year-wise splitting : Try 2
  purrr::map2(UZone,FYear, ~ dplyr::filter(Input_File, Zone == ., Fiscal.Year == .))

But this code doesn't work. 但是此代码不起作用。 I get an error message that : Error: .x (2) and .y (3) are different lengths 我收到一条错误消息: Error: .x (2) and .y (3) are different lengths

Question 1: Can I use map2 to do what I am trying to do? 问题1:我可以使用map2做我想做的事情吗? If not, is there any other better way? 如果没有,还有其他更好的方法吗?

Question 2: Just in case, we are able to use map2 , how can I generate two Excel files using one command? 问题2:以防万一,我们能够使用map2 ,如何使用一个命令生成两个Excel文件? As you can see above, I have two function calls above. 如您在上面看到的,我上面有两个函数调用。 I'd want to have only one. 我只想要一个。

Question 3: Instead of two statements below, is there any way to do sum and count in one statement? 问题3:除了下面的两个语句,还有什么方法可以对一个语句求和和计数? I am looking for more cleaner ways to do sum and count. 我正在寻找更简洁的方法进行总和计数。

purrr::map(~summarise_each(., funs(sum(., na.rm = TRUE), count = n()), L.Rev:I.Qty)) %>%
    purrr::map(~select(., Fiscal.Year:L.Rev_count)) %>%

Can someone please help me? 有人可以帮帮我吗?


Here's my data: 这是我的数据:

dput(Input_File)

structure(list(Zone = c("East", "East", "East", "East", "East", 
"East", "East", "West", "West", "West", "West", "West", "West", 
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016, 
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132, 
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Rev = c(3, 0, 0, 1, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1), L.Qty = c(3, 
0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Rev = c(0, 0, 0, 1, 
1, 1, 0, 0, 0, 0, 0, 1, 0, 0), A.Qty = c(0, 0, 0, 2, 2, 3, 0, 
0, 0, 0, 0, 3, 0, 0), I.Rev = c(4, 4, 4, 0, 1, 0, 3, 0, 0, 0, 
1, 0, 1, 1), I.Qty = c(2, 2, 2, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1, 
1)), .Names = c("Zone", "Fiscal.Year", "Transaction.ID", "L.Rev", 
"L.Qty", "A.Rev", "A.Qty", "I.Rev", "I.Qty"), row.names = c(NA, 
14L), class = "data.frame")

Output Format: Here's the code to generate the output. 输出格式:这是生成输出的代码。 I would love to see EastList.2016 and EastList.2017 as two sheets in one Excel file, and WestList.2016 , WestList.2017 and WestList.2018 as 3 sheets in one Excel file. 我希望将EastList.2016EastList.2017视为一个Excel文件中的两张纸,并将WestList.2016WestList.2017WestList.2018视为一个Excel文件中的三张纸。

  #generate the output:
  EastList.2016 <- EastList[[1]]
  EastList.2017 <- EastList[[2]]
  WestList.2016 <- WestList[[1]]
  WestList.2017 <- WestList[[2]]
  WestList.2018 <- WestList[[3]]

Two lists broken down by year with sums and counts for each? 按年份细分的两个列表,每个列表的总数和计数?

In dplyr : (df <- your dataframe) dplyr中 :( df <-您的数据

df %>% 
group_by(Zone, Fiscal.Year) %>%
summarise_at(vars(L.Rev:I.Qty), funs(sum = sum, cnt = n()))

Source: local data frame [5 x 14]
Groups: Zone [?]

   Zone Fiscal.Year L.Rev_sum L.Qty_sum A.Rev_sum A.Qty_sum I.Rev_sum I.Qty_sum L.Rev_cnt L.Qty_cnt A.Rev_cnt A.Qty_cnt I.Rev_cnt I.Qty_cnt
  <chr>       <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <int>     <int>     <int>     <int>     <int>     <int>
1  East        2016         4         4         3         7        13         7         6         6         6         6         6         6
2  East        2017         2         1         0         0         3         3         1         1         1         1         1         1
3  West        2016         4         4         0         0         0         0         3         3         3         3         3         3
4  West        2017         3         3         1         3         1         1         2         2         2         2         2         2
5  West        2018         3         3         0         0         2         2         2         2         2         2         2         2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM