简体   繁体   中英

How can I add a new column and use an existing column in a data frame in R?

I am trying to add a column called "Visited" which looks at an existing column called "Visits" and if "visits" = NA, then i want "visited" to = 0, but if "visits" > 0, "visited" should = 1. I am getting an error which states "Error in mutate(Visited = if (Visits == "NA") {: object 'Visits' not found". Thank you for all advice.! Here is my code.

mutate(Visited = 
  if(Visits == "NA") {
  replace("NA", 0)
  } else {
  replace(1)
  }
)````

Some issues:

  1. You cannot use if in a mutate like this: I'm inferring that your data is more than one row, in which case Visits == "NA" will be a logical vector , length greater than 1. The if conditional must have length-1. What you likely need is a vectorized conditional such as ifelse or replace .

    There are a few things to realize: vectorized conditionals do not short-circuit ( && and || do short-circuit, & and | do not, and you cannot just interchange them); and ifelse has issues with classes other than logical , integer , numeric , and character .

  2. Your use of replace is incorrect: it requires three arguments, it infers nothing. You cannot use just replace(0) hoping that it will know to look for a conditional outside of its call.

  3. There is a big difference between the R symbol NA (which can be numeric , logical , string, etc) and the string "NA" . There are times when mis-read data gives you strings of "NA" , but typically it's not. Note that NA ==. anything is going to be NA (not true/false), since NA can be interpreted as "can be anything" as well as "not applicable". Because of this, if you have NA s in your code, then . == "NA" . == "NA" is going to first internally coerce the data to strings, which does not convert NA to "NA" , and then look for the literal "NA" , not what you want/need. I hope that makes sense.

  4. The error message suggests that you are not passing in data . mutate(Visited =...) works fine if the call to mutate is in a dplyr/magrittr "pipe" ( %>% ), but by itself mutate requires its first argument to be a data.frame , as in mutate(mydata, Visited=...) .

Here are some equivalent alternatives that should work for you:

mydata %>%
  mutate(
    Visited1 = ifelse(!is.na(Visits) & Visits > 0, 1, 0),
    Visited2 = replace(rep(1, n()), is.na(Visits) | Visits <= 0, 0),
    Visited3 = +(!is.na(Visits) & Visits > 0)
  )

The third takes advantage of R's coercion from logical to integer with the +(.) shortcut.

You pick which you prefer.

ifelse should do the trick. Note: df can be replaced by your dataframe s name:

df$Visited = ifelse(is.na(df$Visits), 0, 1)

If you prefer dplyr :

library(dplyr)
df = df %>%
        mutate(Visited = ifelse(is.na(Visits), 0, 1))
library(dplyr)
df %>%
  mutate(Visited = if_else(is.na(Visits), 0, 1))
  Visits Visited
1     NA       0
2      2       1
3      1       1
4     NA       0
5      5       1 

Data:

df <- data.frame(
  Visits = c(NA, 2, 1, NA, 5)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM