[英]How can I count a variable in R conditional on the value of another variable?
我想根據第三個變量的值計算另一個變量在數據框中的變量出現次數。 這是我的數據:
Name Store Purchase Date
John CVS Shampoo 1/1/2001
John CVS Toothpaste 1/1/2001
John Whole Foods Kombucha 1/1/2005
John Kroger Ice Cream 1/1/2002
Jane CVS Soap 1/1/2001
Jane Whole Foods Crackers 1/1/2004
對於每次購買,我想要計算指定人之前購買的次數以及之前的購物次數,如下所示:
Name Store Purchase Date Prev_Purchase Prev_trip
John CVS Shampoo 1/1/2001 0 0
John CVS Toothpaste 1/1/2001 0 0
John Whole Foods Kombucha 1/1/2005 3 2
John Kroger Ice Cream 1/1/2002 2 1
Jane CVS Soap 1/1/2001 0 0
Jane Whole Foods Crackers 1/1/2004 1 1
如果我想要每個人的購買/旅行總數,我會使用 count 或 tapply——有沒有辦法調整這些函數,以便輸出以第三個變量(日期)為條件?
也許您可以使用ave
嘗試基本的 R 代碼
transform(df,
Prev_Purchase = ave(as.numeric(as.Date(Date, "%d/%m/%Y")), Name, FUN = function(x) sapply(x, function(p) sum(p > x))),
Prev_trip = ave(as.numeric(as.Date(Date, "%d/%m/%Y")), Name, FUN = function(x) sapply(x, function(p) length(unique(x[p > x]))))
)
這使
Name Store Purchase Date Prev_Purchase Prev_trip
1 John CVS Shampoo 1/1/2001 0 0
2 John CVS Toothpaste 1/1/2001 0 0
3 John Whole Foods Kombucha 1/1/2005 3 2
4 John Kroger Ice Cream 1/1/2002 2 1
5 Jane CVS Soap 1/1/2001 0 0
6 Jane Whole Foods Crackers 1/1/2004 1 1
數據
df <- structure(list(Name = c("John", "John", "John", "John", "Jane",
"Jane"), Store = c("CVS", "CVS", "Whole Foods", "Kroger", "CVS",
"Whole Foods"), Purchase = c("Shampoo", "Toothpaste", "Kombucha",
"Ice Cream", "Soap", "Crackers"), Date = c("1/1/2001", "1/1/2001",
"1/1/2005", "1/1/2002", "1/1/2001", "1/1/2004")), class = "data.frame", row.names = c(NA,
-6L))
我認為它應該可以解決您的問題。 如果您的數據很大,最好優化此代碼塊。
# load environment
library(lubridate)
# base function
AddInfo = function(name, date, df) {
prev_purchase = sum(df$Name == name & df$Date < date)
prev_trip = length(unique(filter(df, Name == name & Date < date)$Date))
data = data.frame(
Prev_purchase = prev_purchase,
Prev_trip = prev_trip
)
return(data)
}
# define data frame
df = data.frame(
Name = c(rep('John', 4), rep('Jane', 2)),
Store = c('CVS', 'CVS', 'Whole Foods', 'Kroger', 'CVS', 'Whole Foods'),
Purchase = c('Shampoo', 'Toothpaste', 'Kombucha', 'Ice Cream', 'Soap', 'Crackers'),
Date = c('1/1/2001', '1/1/2001', '1/1/2005', '1/1/2002', '1/1/2001', '1/1/2004')
)
# transform date to POSIXct
df$Date = dmy(df$Date)
# apply function and bind the results
cols = mapply(AddInfo, df$Name, df$Date, MoreArgs = list(df), SIMPLIFY = FALSE)
cols = bind_rows(cols)
df = cbind(df, cols)
這是輸出:
Name Store Purchase Date Prev_purchase Prev_trip
1 John CVS Shampoo 1/1/2001 0 0
2 John CVS Toothpaste 1/1/2001 0 0
3 John Whole Foods Kombucha 1/1/2005 3 2
4 John Kroger Ice Cream 1/1/2002 2 1
5 Jane CVS Soap 1/1/2001 0 0
6 Jane Whole Foods Crackers 1/1/2004 1 1
我們也可以使用outer
library(dplyr)
library(lubridate)
df %>%
mutate(Date = dmy(Date)) %>%
group_by(Name) %>%
mutate(Prev_Purchase = colSums(outer(Date, Date, FUN = "<")),
Prev_trip = colSums(outer(unique(Date), Date, FUN = "<")))
# A tibble: 6 x 6
# Groups: Name [2]
# Name Store Purchase Date Prev_Purchase Prev_trip
# <chr> <chr> <chr> <date> <dbl> <dbl>
#1 John CVS Shampoo 2001-01-01 0 0
#2 John CVS Toothpaste 2001-01-01 0 0
#3 John Whole Foods Kombucha 2005-01-01 3 2
#4 John Kroger Ice Cream 2002-01-01 2 1
#5 Jane CVS Soap 2001-01-01 0 0
#6 Jane Whole Foods Crackers 2004-01-01 1 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.