简体   繁体   中英

Apply custom function to dataframe

Need help on something not too complex, but new to me. I have a dataframe df with a column Product.id, and a price Price.

Product.id  price
A   11.5
A   11.5
A   12
A   13
A   13
B   9.25
B   9.75
B   9.75
B   9.5

I would like to check if the price has changed from previous month using a custom function:

Check.Price.Change <- function(Vector){
  for(x in 1:nrow(Vector)){
    if(Vector[x] != Vector[x-1]){
      TRUE 
    }
  }
}

check if bucket has change from previous month

df <- df %>%
  group_by(Product.id) %>%
  mutate(if.Price.change = lapply(Price, Check.Price.Change))

I get the error:

Error in 1:nrow(Vector) : argument of length 0
Called from: FUN(X[[i]], ...)

What would be the right way to to please?

We can use lag in dplyr to compare with previous entry.

library(dplyr)
df %>% group_by(Product.id) %>%  mutate(is_changed = price != lag(price))

# Product.id price is_changed
#  <fct>      <dbl> <lgl>     
#1 A          11.5  NA        
#2 A          11.5  FALSE     
#3 A          12    TRUE      
#4 A          13    TRUE      
#5 A          13    FALSE     
#6 B           9.25 NA        
#7 B           9.75 TRUE      
#8 B           9.75 FALSE     
#9 B           9.5  TRUE      

Similarly, there is shift function in data.table whose default type is "lag"

library(data.table)
setDT(df)[, is_changed := price != shift(price), by = Product.id]

data

df <- structure(list(Product.id = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), price = c(11.5, 
11.5, 12, 13, 13, 9.25, 9.75, 9.75, 9.5)), class = "data.frame", 
row.names = c(NA, -9L))

The code below will add an indicator column if the previous Price matches the current row's price. lag (and lead ) are dplyr functions which let you make comparisons between a column's values in different rows efficiently. The vectorized if_else , also from dplyr, will make the value if.Price.change TRUE if the condition is met, FALSE , if not, and NA if it can't make the comparison. Note that it won't be able to make the comparison for the first row, because there is no previous row to pull a value from. As a side note, lag / lead let's use compare multiple rows forward or back, the default is just 1.

Using dplyr:

df <- df %>% group_by(Product.id) %>%
              mutate(if.Price.change = if_else(lag(Price) == Price, TRUE, FALSE, NA) %>% ungroup
# A tibble: 9 x 3
#  Product.id Price if.Price.change
#  <fct>      <dbl> <lgl>          
#1 A          11.5  NA             
#2 A          11.5  TRUE           
#3 A          12    FALSE          
#4 A          13    FALSE          
#5 A          13    TRUE           
#6 B           9.25 NA             
#7 B           9.75 FALSE          
#8 B           9.75 TRUE           
#9 B           9.5  FALSE     

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM