简体   繁体   中英

How to subset data frame by date and perform multiple operations in R?

I receive daily CSV reports, and each has the same number of variables but from different times. I want to run some simple analysis based on date and save the results. I think a for loop can do the job, but I only know the basics. Ideally, I only need to run the script once a month and get the results. Any guidance or advise is appreciated.

Let's say I have two CSV reports in a folder:

#File 1 - 20200624.csv
Date        Market  Salesman    Product Quantity    Price   Cost
6/24/2020   A       MF          Apple   20          1       0.5
6/24/2020   A       RP          Apple   15          1       0.5
6/24/2020   A       RP          Banana  20          2       0.5
6/24/2020   A       FR          Orange  20          3       0.5
6/24/2020   B       MF          Apple   20          1       0.5
6/24/2020   B       RP          Banana  20          2       0.5

#File 2 - 20200625.csv
Date        Market  Salesman    Product Quantity    Price   Cost
6/25/2020   A       MF          Apple   10          1       0.6
6/25/2020   A       MF          Banana  15          1       0.6
6/25/2020   A       RP          Banana  10          2       0.6
6/25/2020   A       FR          Orange  15          3       0.6
6/25/2020   B       MF          Apple   20          1       0.6
6/25/2020   B       RP          Banana  20          2       0.6

I imported all the files into R using the following codes:

library(readr)
library(dplyr)

#Import files
files <- list.files(path = "~/JuneReports", 
                    pattern = "*.csv", full.names = T)
tbl <- sapply(files, read_csv, simplify=FALSE) %>% 
  bind_rows(.id = "id")
#Remove the "id" column
tbl2 <- tbl[,-1]
#Subset the data frame to get only Mark A, as Market B is irrelavant.
tbl3 <- subset(tbl2, Market == "A")
head(tbl3)
# A tibble: 6 x 7
  Date      Market Salesman Product Quantity Price  Cost
  <chr>     <chr>  <chr>    <chr>      <dbl> <dbl> <dbl>
1 6/24/2020 A      MF       Apple         20     1   0.5
2 6/24/2020 A      RP       Apple         15     1   0.5
3 6/24/2020 A      RP       Banana        20     2   0.5
4 6/24/2020 A      FR       Orange        20     3   0.5
5 6/25/2020 A      MF       Apple         10     1   0.6
6 6/25/2020 A      MF       Banana        15     1   0.6

Below are the results I want to get:

Date        Market  Revenue Total Cost  Apples Sold Bananas Sold    Oranges Sold
6/24/2020   A       135     37.5        35          20              20
6/25/2020   A       90      30          15          25              15

#Revenue = sumproduct(Quantity, Price)
#Total Cost = sumproduct(Quantity, Cost)
#Apples/Bananas/Oranges Sold = sum(Product == "Apple/Banana/Orange")

We group by 'Date', 'Market', calculate the sum of product of 'Quantity' with 'Price', and 'Cost', .add that also in the group_by along with 'Product', get the sum of 'Quantity' and use pivot_wider to reshape into 'wide' format

library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
    group_by(Date, Market) %>% 
    group_by(Revenue = c(Quantity %*% Price), 
             TotalCost = c(Quantity %*% Cost),
             Product, .add = TRUE) %>% 
    summarise(Sold = sum(Quantity)) %>% 
    pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int>
#1 6/24/2020 A          135      37.5    35     20     20
#2 6/25/2020 A           25      15      10     15     NA

data

df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020", 
"6/24/2020", "6/25/2020", "6/25/2020"), Market = c("A", "A", 
"A", "A", "A", "A"), Salesman = c("MF", "RP", "RP", "FR", "MF", 
"MF"), Product = c("Apple", "Apple", "Banana", "Orange", "Apple", 
"Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), Price = c(1L, 
1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM