[英]How to fill different values in a new column based on different values of another column using dplyr?
Here is my data: 这是我的数据:
a <- data.frame(x=c('A','A','A','B','B','B'),
y=c('Yes','No','No','Yes','No','No'),
z=c(1,2,3,4,5,6))
I want to generate a new column this way: 我想以这种方式生成一个新列:
x
, so all the A
s will be in one group and all B
s in another 按x
分组,因此所有A
都在一个组中,所有B
在另一个组中 y=Yes
, then keep the z
value in the new column. 对于每个组,如果y=Yes
,则将z
值保留在新列中。 If y=No
, then using the z
value with y=Yes
. 如果y=No
,则将z
值与y=Yes
。 So, the new data should look like this: 因此,新数据应如下所示:
x y z z1
A Yes 1 1
A No 2 1
A No 3 1
B Yes 4 4
B No 5 4
B No 6 4
I can use this way to do: 我可以用这种方式来做:
a1 <- a %>%
filter(y=='Yes') %>%
distinct(x,y,z)
a2 <- a %>%
left_join(a1,by='x') %>%...
But in this way, I have to generate a1
as an intermediate. 但是以这种方式,我必须生成a1
作为中间体。 How to do this just in one pipeline without generating a new variable like a1
in my example? 在我的示例中,如何仅在一个管道中执行此操作而不生成像a1
这样的新变量?
You could combine both pipelines and perform the same functions in one shot. 您可以将两个管道结合在一起,一次执行相同的功能。
ie.. 即..
a <- data.frame(x=c('A','A','A','B','B','B'),
y=c('Yes','No','No','Yes','No','No'),
z=c(1,2,3,4,5,6))
a %>% left_join(a %>% filter(y=='Yes') %>% distinct(x,y,z), by='x') %>% select(-y.y)
This results in duplicate columns tagged with .x and .y as a result of the join. 作为连接的结果,这将导致使用.x和.y标记的重复列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.