[英]Marking the X previous occurences in R
I have a dataset with many different social media creators (creator_id).我有一个包含许多不同社交媒体创建者(creator_id)的数据集。 They posted many times (posting_count) and the posts are classified as ad if ad = 1. Now I always want to classify the 3 previous postings before ad = 1 as 1. Basically the "goal_variable" is what I want to get.
他们发布了很多次(posting_count),如果 ad = 1,帖子被归类为广告。现在我总是想将 ad = 1 之前的 3 个以前的帖子归类为 1。基本上“goal_variable”就是我想要得到的。 A solution without a loop would be cool!!
没有循环的解决方案会很酷!
creator_id <-c("aaa","aaa","aaa","aaa","aaa","aaa","aaa","aaa","bbb","bbb","bbb","bbb","bbb","bbb","bbb","bbb","bbb")
posting_count <- c(143,144,145,146,147,148,149,150,90,91,92,93,94,95,96,97,98)
ad <- c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1)
goal_variable <- c(0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,0)
df <- cbind(creator_id, posting_count, ad, goal_variable)
First, a cleaner way to make df, without the intermediate variables.首先,一种更简洁的方式来制作 df,没有中间变量。
We can use ifelse
with multiple |
我们可以将
ifelse
与多个|
(or) statements. (或)陈述。 Here each
lead
does the following: Lead([variable], [n to look ahead], [default value(0)])
.这里每个
lead
客户执行以下操作: Lead([variable], [n to look ahead], [default value(0)])
。
df <- data.frame(creator_id =c("aaa","aaa","aaa","aaa","aaa","aaa","aaa","aaa","bbb","bbb","bbb","bbb","bbb","bbb","bbb","bbb","bbb"),
posting_count = c(143,144,145,146,147,148,149,150,90,91,92,93,94,95,96,97,98),
ad = c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1),
goal_variable = c(0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,0))
library(dplyr)
df %>%
group_by(creator_id) %>%
mutate(new_goal=ifelse(lead(ad,1,0)==1|lead(ad,2,0)==1|lead(ad,3,0)==1,1,0))
Here's a programmatical way using map
.这是使用
map
的编程方式。 Basically, for each row, checking whether the current row is between 3 and 1 positions before the closest ad == 1
.基本上,对于每一行,检查当前行是否在最接近的
ad == 1
之前的 3 到 1 个位置之间。
library(purrr)
library(dplyr)
df %>%
group_by(creator_id) %>%
mutate(goal_variable = map_int(row_number(), ~ any((.x - which(ad == 1)) %in% -3:-1)))
output output
# A tibble: 17 × 4
# Groups: creator_id [2]
creator_id posting_count ad goal_variable
<chr> <dbl> <dbl> <int>
1 aaa 143 0 0
2 aaa 144 0 0
3 aaa 145 0 0
4 aaa 146 0 1
5 aaa 147 0 1
6 aaa 148 0 1
7 aaa 149 1 0
8 aaa 150 0 0
9 bbb 90 0 0
10 bbb 91 0 0
11 bbb 92 0 0
12 bbb 93 0 1
13 bbb 94 0 1
14 bbb 95 0 1
15 bbb 96 1 1
16 bbb 97 0 1
17 bbb 98 1 0
An option with slider
slider
的选项
library(slider)
library(dplyr)
df %>%
group_by(creator_id) %>%
mutate(goal_variable2 = lead(+(slide_int(ad, \(x) 1 %in% x,
.after = 2)), default = 0)) %>%
ungroup
-output -输出
# A tibble: 17 × 5
creator_id posting_count ad goal_variable goal_variable2
<chr> <dbl> <dbl> <dbl> <dbl>
1 aaa 143 0 0 0
2 aaa 144 0 0 0
3 aaa 145 0 0 0
4 aaa 146 0 1 1
5 aaa 147 0 1 1
6 aaa 148 0 1 1
7 aaa 149 1 0 0
8 aaa 150 0 0 0
9 bbb 90 0 0 0
10 bbb 91 0 0 0
11 bbb 92 0 0 0
12 bbb 93 0 1 1
13 bbb 94 0 1 1
14 bbb 95 0 1 1
15 bbb 96 1 1 1
16 bbb 97 0 1 1
17 bbb 98 1 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.