简体   繁体   中英

Multiply Columns by Substrings in R

Suppose I have a dataframe that has multiple components and their properties listed out in multiple columns and I want to run multiple functions against the columns. My approach was to try and base it off the substring in each column header, but I haven't been able to figure out how to do that. Below is an example of the data frame.

Basket   F_Type_1   F_Qty_1   F_P_1   F_Type_2   F_Qty_2   F_P_2 
AAA       Apple     10        2.5     Banana     9         2
BBB       Peach     5         6       Melon      20        5

I essentially want to cbind two new columns to the end of this dataframe that multiplies Qty and P so you get two new columns at the end like below.

F_Total_1   F_Total_2
25          18
30          100

The input is dynamic so on occasion it could be 2 fruits or 10 fruits in certain baskets. But I can figure that portion out, it's moreso trying to figure out how to multiply columns based on the Substrings '1' or '2'.

I appreciate all your help and any other approaches you may have!

We create a function that finds the specific names, and then calculates the rowwise products. The heavy lifter of this function is the mapply function. We add a final step to rename the resultant data.frame .

fun1 <- function(data){
  qty_names <- names(data)[grepl(pattern = "Qty", x = names(data))]

  p_names <- names(data)[grepl(pattern = "P", x = names(data))]

  setNames(
    data.frame(
      mapply(qty_names, p_names, 
             FUN = function(n1, n2) apply(data[c(n1,n2)], 1, prod))),
      paste0('F_Total_', 1:length(p_names)))


}

cbind(dat, fun1(dat))

  Basket F_Type_1 F_Qty_1 F_P_1 F_Type_2 F_Qty_2 F_P_2 F_Total_1 F_Total_2
1    AAA    Apple      10   2.5   Banana       9     2        25        18
2    BBB    Peach       5   6.0    Melon      20     5        30       100

Create data

library(data.table)
df <- fread("
Basket   F_Type_1   F_Qty_1   F_P_1   F_Type_2   F_Qty_2   F_P_2 
AAA       Apple     10        2.5     Banana     9         2
BBB       Peach     5         6       Melon      20        5
")

df
#    Basket F_Type_1 F_Qty_1 F_P_1 F_Type_2 F_Qty_2 F_P_2
# 1:    AAA    Apple      10   2.5   Banana       9     2
# 2:    BBB    Peach       5   6.0    Melon      20     5

For numbers from 1 to sum(grepl('F_P_', names(df))) , set F_Total_{number} as F_Qty_{number}*F_P_{number}

for(i in seq(sum(grepl('F_P_', names(df)))))
  df[, paste0('F_Total_', i) := Reduce(`*`, .SD)
     , .SDcols = paste0(c('F_Qty_', 'F_P_'), i)]

df
#    Basket F_Type_1 F_Qty_1 F_P_1 F_Type_2 F_Qty_2 F_P_2 F_Total_1 F_Total_2
# 1:    AAA    Apple      10   2.5   Banana       9     2        25        18
# 2:    BBB    Peach       5   6.0    Melon      20     5        30       100

Or in base R with df as a data.frame

for(i in seq(sum(grepl('F_P_', names(df)))))
  df[paste0('F_Total_', i)] <- Reduce(`*`, df[paste0(c('F_Qty_', 'F_P_'), i)])

In base R, you can use the [[]] form to access column with a string, so you can loop easily

df <- read.table(text = "Basket   F_Type_1   F_Qty_1   F_P_1   F_Type_2   F_Qty_2   F_P_2 
AAA       Apple     10        2.5     Banana     9         2
BBB       Peach     5         6       Melon      20        5",header = T)

for(i in 1:2)
{
  df[[paste0("F_Total_",i)]] <- as.numeric(df[[paste0("F_P_",i)]])*as.numeric(df[[paste0("F_Qty_",i)]])
}

  Basket F_Type_1 F_Qty_1 F_P_1 F_Type_2 F_Qty_2 F_P_2 F_Total_1 F_Total_2
1    AAA    Apple      10   2.5   Banana       9     2        25        18
2    BBB    Peach       5   6.0    Melon      20     5        30       100

Here's a method using tidyverse functions for reshaping your data. Basically using tidyr verbs, we reshape your data into a more "tidy" format.

# library(dplyr); library(tidyr)
dd %>% select(Basket, contains("_Qty_"), contains("_P_")) %>% 
  gather("key", "value", -Basket) %>% 
  separate(key, c("F", "Val", "Grp")) %>% 
  group_by(Basket, Grp) %>% 
  spread(Val, value) %>% 
  mutate(Total=P*Qty, GrpN=paste0("Total_", Grp)) %>% 
  ungroup() %>% 
  select(Basket, GrpN, Total) %>% 
  spread(GrpN ,Total)

#   Basket Total_1 Total_2
#   <fct>    <dbl>   <dbl>
# 1 AAA         25      18
# 2 BBB         30     100
setNames(df[grepl("F_Qty",names(df))]*df[grepl("F_P",names(df))],paste0("F_Total_",1:2))

  F_Total_1 F_Total_2
1        25        18
2        30       100

Suggestion: If you could store your data in the long format, it will make future maintenance much cleaner.

Basket     Item     Type    Qty Price    Total
AAA           1    Apple     10   2.5       25
AAA           2    Banana     9     2       18
BBB           1    Peach      5   6.0       30
BBB           2    Melon     20     5      100
BBB           3    Orange    11   2.7     29.7

Then, when you really need the data in the above format, use the data.table package to transpose the data.

library(data.table)
fruits <- data.frame("Basket" = c("AAA", "AAA", "BBB", "BBB", "BBB"),
                 "Item" = c(1,2,1,2,3),
                 "Type" = c("Apple", "Banana", "Peach", "Melon", "Orange"),
                 "Qty" = c(10, 9, 5, 20, 11),
                 "Price" = c(2.5, 2, 6.0, 5, 2.7)
)

fruits$Total <- fruits$Qty * fruits$Price
fruits.New <- data.table::dcast(setDT(fruits),
                            formula = Basket ~ Item,
                            value.var = c("Type", "Price", "Qty", "Total"))

This way, you only have to maintain one formula and not to worry about apply , while giving you the flexibility to have variable column numbers too.

The column name should be relatively easy to fix if it's important.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM