I am trying to create the variable 'active' in a df conditional of values from a string in R (act_users). If the name from the variables scr_name and rt_name in the df is within the string, I would like the variable to take the value 1, if not 0.
df <- data.frame("screen_name" = c("august", "berit", "christopher", "david", "erica", "frank"), "rt_name" = c("berit", "august", "david", "erica", "frank", "christopher"))
act_users <- c("david", "august", "berit")
I have tried the following if else statements, but none of them work
'%!in%' <- function(x,y)!('%in%'(x,y))#create a function
df$active <- ifelse((df$screen_name %in% act_users) & (df$rt_name %in% act_users), 1,
ifelse((df$screen_name %!in% act_users) & (df$rt_name %!in% act_users), 2))
#attempts only with screenname
df$active <- ifelse(df$screen_name %in% act_users, "1", ifelse(df$screen_name %!in% act_users, "0"))
df$active <- if(df$screen_name %in% act_users){
df$active == 1
} else {
df$active == 0}
My last solution would be to make the active user string as a df, merge the results and match the colomns inside the dataframe, but my data is quite big, so it would be nice with a more efficient solution?
Thanks in adcvance!
If it is an exact match you can use:
df$active = apply(df,1,function(i)as.numeric(all(i %in% act_users)))
You take every row, and return a True / False whether each column is an element of act_users. All
will give you 1 only if all booleans are true.
screen_name rt_name active
1 august berit 1
2 berit august 1
3 christopher david 0
4 david erica 0
5 erica frank 0
6 frank christopher 0
Maybe you can use the code below to make it, which could be faster than apply(df,1,...)
when you have many rows:
df$active <- Reduce("*",lapply(df, function(x) ifelse(x %in% act_users,1,0)))
df <- within(df, active <- ifelse(screen_name%in%act_users & rt_name%in%act_users,1,0))
Output
> df
screen_name rt_name active
1 august berit 1
2 berit august 1
3 christopher david 0
4 david erica 0
5 erica frank 0
6 frank christopher 0
If you want to check only two columns, you can use %in%
on both the columns and combine the result.
df$active <- +(df$screen_name %in% act_users & df$rt_name %in% act_users)
df
# screen_name rt_name active
#1 august berit 1
#2 berit august 1
#3 christopher david 0
#4 david erica 0
#5 erica frank 0
#6 frank christopher 0
The +
at the beginning of ifelse
changes the logical values to integer value.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.