R. 如何创建一个新列，根据 R 中的另一列返回 i

Question

I have a dataframe我有一个数据框

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

I want a new column employ.data$NA that returns i in employ.data$employee if [i] in any other row is NA.如果任何其他行中的[i]是 NA，我想要一个新列eploy.data$NA，该列在employ.data$employee中返回i。

I have tried this for 1 column but getting errors我已经尝试了 1 列，但出现错误

employ.data$NA = NA 
{for (i in 1:nrow(Eurostat)) 
  {
  if (startdate[i] = "NA")  employ.data$employee[i]
}

Any help would be appreciated.任何帮助，将不胜感激。

Answer 1

You need complete.cases() from base R -您需要来自基础 R 的complete.cases() -

employ.data$missingFlag <- !complete.cases(employ.data)

    employee salary  startdate missingFlag
1   John Doe  21000 2010-11-01       FALSE
2 Peter Gynn     NA       <NA>        TRUE
3 Jolie Hope  26800 2007-03-14       FALSE

Answer 2

Try to vectorize it and use an ifelse statement:尝试对其进行矢量化并使用ifelse语句：

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["missing"] = with(employ.data, ifelse(is.na(startdate), employee, NA))
employ.data
    employee salary  startdate    missing
1   John Doe  21000 2010-11-01       <NA>
2 Peter Gynn     NA       <NA> Peter Gynn
3 Jolie Hope  26800 2007-03-14       <NA>

Alternatively, to check all columns, use any :或者，要检查所有列，请使用any ：

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["something_missing"] = apply(employ.data, 1, function(x) any(is.na(x)))
employ.data
    employee salary  startdate something_missing
1   John Doe  21000 2010-11-01             FALSE
2 Peter Gynn     NA       <NA>              TRUE
3 Jolie Hope  26800 2007-03-14             FALSE

The construct above will give you booleans.上面的构造会给你布尔值。 If you want to get a column of the names, you can combine it with ifelse .如果您想获取一列名称，可以将其与ifelse结合使用。

On a more general note, instantiating a column first and then looping through the dataframe to populate it is not particularly Rtistic , and I would suggest to avoid this strategy whenever possible.更一般地说，首先实例化一列然后循环遍历数据帧以填充它并不是特别Rtistic ，我建议尽可能避免这种策略。 The apply -family of functions are very powerful, and ifelse is too. apply -family 函数非常强大， ifelse也是。 dplyr 's mutate combined with case_when statments can also be used in case you want something more SQL-like.如果您想要更像 SQL 的东西，也可以使用dplyr的mutate与case_when语句结合使用。

Just for pedagocial reasons, here is your code in working version.仅出于教学原因，这是您的工作版本代码。 Please don't use it, just try to understand the differences.请不要使用它，只是尝试了解差异。

employ.data$missing = NA 
for (i in 1:nrow(employ.data)) {
  if (is.na(employ.data$startdate[i])){ 
         employ.data$missing[i] <- employ.data$employee[i]
        }
}

Importantly, note that "NA" is interpreted as a string.重要的是，请注意"NA"被解释为字符串。 To test if a value is NA , you need to use eg is.na .要测试值是否为NA ，您需要使用例如is.na 。 After all, testing if 42 == NA is ambiguous.毕竟，测试是否42 == NA是不明确的。 The value is missing.缺少值。 It may or may not be equal to 42, so the test will return NA .它可能等于也可能不等于 42，因此测试将返回NA 。

Answer 3

It can be done quite easily with dplyr:使用 dplyr 可以很容易地完成：

library(dplyr)

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

employ.data <- employ.data %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(c(salary, startdate))))

R. 如何创建一个新列，根据 R 中的另一列返回 i

问题描述

3 个解决方案

解决方案1
4 已采纳 2018-10-06 13:46:53

解决方案2
2 2018-10-06 13:33:10

解决方案3
2 2018-10-06 17:04:45

R. 如何创建一个新列，根据 R 中的另一列返回 i

问题描述

3 个解决方案

解决方案1 4 已采纳 2018-10-06 13:46:53

解决方案2 2 2018-10-06 13:33:10

解决方案3 2 2018-10-06 17:04:45

解决方案1
4 已采纳 2018-10-06 13:46:53

解决方案2
2 2018-10-06 13:33:10

解决方案3
2 2018-10-06 17:04:45