简体   繁体   English

R. 如何创建一个新列,根据 R 中的另一列返回 i

[英]R. How to create a new column, returning i based on another column in R

I have a dataframe我有一个数据框

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

I want a new column employ.data$NA that returns i in employ.data$employee if [i] in any other row is NA.如果任何其他行中的[i]是 NA,我想要一个新列eploy.data$NA,该列在employ.data$employee中返回i。

I have tried this for 1 column but getting errors我已经尝试了 1 列,但出现错误

employ.data$NA = NA 
{for (i in 1:nrow(Eurostat)) 
  {
  if (startdate[i] = "NA")  employ.data$employee[i]
}

Any help would be appreciated.任何帮助,将不胜感激。

You need complete.cases() from base R -您需要来自基础 R 的complete.cases() -

employ.data$missingFlag <- !complete.cases(employ.data)

    employee salary  startdate missingFlag
1   John Doe  21000 2010-11-01       FALSE
2 Peter Gynn     NA       <NA>        TRUE
3 Jolie Hope  26800 2007-03-14       FALSE

Try to vectorize it and use an ifelse statement:尝试对其进行矢量化并使用ifelse语句:

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["missing"] = with(employ.data, ifelse(is.na(startdate), employee, NA))
employ.data
    employee salary  startdate    missing
1   John Doe  21000 2010-11-01       <NA>
2 Peter Gynn     NA       <NA> Peter Gynn
3 Jolie Hope  26800 2007-03-14       <NA>

Alternatively, to check all columns, use any :或者,要检查所有列,请使用any

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["something_missing"] = apply(employ.data, 1, function(x) any(is.na(x)))
employ.data
    employee salary  startdate something_missing
1   John Doe  21000 2010-11-01             FALSE
2 Peter Gynn     NA       <NA>              TRUE
3 Jolie Hope  26800 2007-03-14             FALSE

The construct above will give you booleans.上面的构造会给你布尔值。 If you want to get a column of the names, you can combine it with ifelse .如果您想获取一列名称,可以将其与ifelse结合使用。

On a more general note, instantiating a column first and then looping through the dataframe to populate it is not particularly Rtistic , and I would suggest to avoid this strategy whenever possible.更一般地说,首先实例化一列然后循环遍历数据帧以填充它并不是特别Rtistic ,我建议尽可能避免这种策略。 The apply -family of functions are very powerful, and ifelse is too. apply -family 函数非常强大, ifelse也是。 dplyr 's mutate combined with case_when statments can also be used in case you want something more SQL-like.如果您想要更像 SQL 的东西,也可以使用dplyrmutatecase_when语句结合使用。

Just for pedagocial reasons, here is your code in working version.仅出于教学原因,这是您的工作版本代码。 Please don't use it, just try to understand the differences.请不要使用它,只是尝试了解差异。

employ.data$missing = NA 
for (i in 1:nrow(employ.data)) {
  if (is.na(employ.data$startdate[i])){ 
         employ.data$missing[i] <- employ.data$employee[i]
        }
}

Importantly, note that "NA" is interpreted as a string.重要的是,请注意"NA"被解释为字符串。 To test if a value is NA , you need to use eg is.na .要测试值是否为NA ,您需要使用例如is.na After all, testing if 42 == NA is ambiguous.毕竟,测试是否42 == NA是不明确的。 The value is missing.缺少值。 It may or may not be equal to 42, so the test will return NA .它可能等于也可能不等于 42,因此测试将返回NA

It can be done quite easily with dplyr:使用 dplyr 可以很容易地完成:

library(dplyr)

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

employ.data <- employ.data %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(c(salary, startdate))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM