[英]Loop through df column, comparing to list and creating new column
I have a column of numbers, like social security numbers for example. 我有一列数字,例如社会保险号。 I would like to compare this column to a list of unacceptable values ( like
11111111
or 12345678
for example). 我想将此列与不可接受的值列表进行比较(例如
11111111
或12345678
)。 There also some grepl operations i would like to perform, like the first 3 digits can't be 000
. 我还想执行一些grepl操作,例如前3位数字不能为
000
。 Below is a skeleton of what I think the code could look like, I prefer a for loop logic. 下面是我认为代码看起来像的骨架,我更喜欢for循环逻辑。
ssns <- c(12343210,23454321,34565432,11111111)
badssns <- c(11111111,22222222)
for( i in 1:length(ssns)) {
if(ssns[i] %in% badssn_list) {
ssns$newcolumn==BADSSN
}
else if( grepl(first 3 numbers 0){
ssns$newcolumn==BADSSN
}
else{ssns$newcolumn==GOODSSN}
}
Just using a nested ifelse
should do the job imo: 仅使用嵌套
ifelse
完成imo工作:
ssns$newcolumn <- ifelse(ssns$num %in% badssns, 'BADSSN',
ifelse(substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN'))
or shorter using an OR statement ( |
): 或更短的内容使用OR语句(
|
):
ssns$newcolumn <- ifelse(ssns$num %in% badssns| substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN')
which gives: 这使:
> ssns
num newcolumn
1 12343210 GOODSSN
2 23454321 GOODSSN
3 34565432 GOODSSN
4 11111111 BADSSN
5 00065432 BADSSN
Used data: 使用的数据:
ssns <- data.frame(num = c('12343210','23454321','34565432','11111111','00065432'), stringsAsFactors = FALSE)
badssns <- c('11111111','22222222')
It seems like you have some experience with computer programming, but maybe are new to R. In most cases, the best R programs don't use for
loops. 您似乎有一定的计算机编程经验,但可能对R来说是新手。在大多数情况下,最好的R程序不
for
循环。
Here's a more R
ish way to accomplish what you've described. 这里有一个更加
R
十岁上下的方式来完成你所描述的东西。 It will be much faster when ssns
and badssns
are long. 当
ssns
和badssns
较长时, badssns
更快。
ssns<-c(12343210,23454321,34565432,11111111)
badssns<-c(11111111,22222222)
good.idxs <- is.na(match(ssns, badssns))
good.ssns <- ssns[good.idxs]
You might want to work with strings rather than numbers -- maybe you are concerned the letter "oh" was used in place of the number "zero". 您可能想使用字符串而不是数字-也许您担心字母“ oh”被用来代替数字“零”。 This approach works in that case as well.
在这种情况下,这种方法也适用。 Somewhat unexpectedly (for me, anyway), it even works when
ssns
is a vector of characters and badssns
is a vector of number or vice versa! 出乎意料的是(无论如何对我而言),当
ssns
是字符的向量而badssns
是数字的向量时,甚至反之亦然!
If ssns
and badssns
are character vectors: 如果
ssns
和badssns
是字符向量:
ssns<-c("12343210","23454321","34565432","11111111","00023456")
badssns<-c("11111111","22222222")
then you can use just one ifelse
: 那么您只能使用一个
ifelse
:
result <- ifelse(ssns %in% badssns | grepl("^0{3}",ssns), "BADSSNS", "GOODSSNS")
##[1] "GOODSSNS" "GOODSSNS" "GOODSSNS" "BADSSNS" "BADSSNS"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.