[英]R. How to create a new column, returning i based on another column in R
I have a dataframe我有一个数据框
employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))
employ.data <- data.frame(employee, salary, startdate)
I want a new column employ.data$NA that returns i in employ.data$employee
if [i]
in any other row is NA.如果任何其他行中的
[i]
是 NA,我想要一个新列eploy.data$NA,该列在employ.data$employee
中返回i。
I have tried this for 1 column but getting errors我已经尝试了 1 列,但出现错误
employ.data$NA = NA
{for (i in 1:nrow(Eurostat))
{
if (startdate[i] = "NA") employ.data$employee[i]
}
Any help would be appreciated.任何帮助,将不胜感激。
You need complete.cases()
from base R -您需要来自基础 R 的
complete.cases()
-
employ.data$missingFlag <- !complete.cases(employ.data)
employee salary startdate missingFlag
1 John Doe 21000 2010-11-01 FALSE
2 Peter Gynn NA <NA> TRUE
3 Jolie Hope 26800 2007-03-14 FALSE
Try to vectorize it and use an ifelse
statement:尝试对其进行矢量化并使用
ifelse
语句:
employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["missing"] = with(employ.data, ifelse(is.na(startdate), employee, NA))
employ.data
employee salary startdate missing
1 John Doe 21000 2010-11-01 <NA>
2 Peter Gynn NA <NA> Peter Gynn
3 Jolie Hope 26800 2007-03-14 <NA>
Alternatively, to check all columns, use any
:或者,要检查所有列,请使用
any
:
employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["something_missing"] = apply(employ.data, 1, function(x) any(is.na(x)))
employ.data
employee salary startdate something_missing
1 John Doe 21000 2010-11-01 FALSE
2 Peter Gynn NA <NA> TRUE
3 Jolie Hope 26800 2007-03-14 FALSE
The construct above will give you booleans.上面的构造会给你布尔值。 If you want to get a column of the names, you can combine it with
ifelse
.如果您想获取一列名称,可以将其与
ifelse
结合使用。
On a more general note, instantiating a column first and then looping through the dataframe to populate it is not particularly Rtistic
, and I would suggest to avoid this strategy whenever possible.更一般地说,首先实例化一列然后循环遍历数据帧以填充它并不是特别
Rtistic
,我建议尽可能避免这种策略。 The apply
-family of functions are very powerful, and ifelse
is too. apply
-family 函数非常强大, ifelse
也是。 dplyr
's mutate
combined with case_when
statments can also be used in case you want something more SQL-like.如果您想要更像 SQL 的东西,也可以使用
dplyr
的mutate
与case_when
语句结合使用。
Just for pedagocial reasons, here is your code in working version.仅出于教学原因,这是您的工作版本代码。 Please don't use it, just try to understand the differences.
请不要使用它,只是尝试了解差异。
employ.data$missing = NA
for (i in 1:nrow(employ.data)) {
if (is.na(employ.data$startdate[i])){
employ.data$missing[i] <- employ.data$employee[i]
}
}
Importantly, note that "NA"
is interpreted as a string.重要的是,请注意
"NA"
被解释为字符串。 To test if a value is NA
, you need to use eg is.na
.要测试值是否为
NA
,您需要使用例如is.na
。 After all, testing if 42 == NA
is ambiguous.毕竟,测试是否
42 == NA
是不明确的。 The value is missing.缺少值。 It may or may not be equal to 42, so the test will return
NA
.它可能等于也可能不等于 42,因此测试将返回
NA
。
It can be done quite easily with dplyr:使用 dplyr 可以很容易地完成:
library(dplyr)
employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))
employ.data <- data.frame(employee, salary, startdate)
employ.data <- employ.data %>%
rowwise() %>%
mutate(missing = any(is.na(c(salary, startdate))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.