简体   繁体   English

遍历df列,与列表进行比较并创建新列

[英]Loop through df column, comparing to list and creating new column

I have a column of numbers, like social security numbers for example. 我有一列数字,例如社会保险号。 I would like to compare this column to a list of unacceptable values ( like 11111111 or 12345678 for example). 我想将此列与不可接受的值列表进行比较(例如1111111112345678 )。 There also some grepl operations i would like to perform, like the first 3 digits can't be 000 . 我还想执行一些grepl操作,例如前3位数字不能为000 Below is a skeleton of what I think the code could look like, I prefer a for loop logic. 下面是我认为代码看起来像的骨架,我更喜欢for循环逻辑。

ssns <- c(12343210,23454321,34565432,11111111)
badssns <- c(11111111,22222222)

for( i in 1:length(ssns)) {
    if(ssns[i] %in% badssn_list) {
        ssns$newcolumn==BADSSN
      }
    else if( grepl(first 3 numbers 0){
        ssns$newcolumn==BADSSN
      }
    else{ssns$newcolumn==GOODSSN}
}

Just using a nested ifelse should do the job imo: 仅使用嵌套ifelse完成imo工作:

ssns$newcolumn <- ifelse(ssns$num %in% badssns, 'BADSSN', 
                         ifelse(substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN'))

or shorter using an OR statement ( | ): 或更短的内容使用OR语句( | ):

ssns$newcolumn <- ifelse(ssns$num %in% badssns| substr(ssns$num,1,3)=='000', 'BADSSN', 'GOODSSN')

which gives: 这使:

> ssns
       num newcolumn
1 12343210   GOODSSN
2 23454321   GOODSSN
3 34565432   GOODSSN
4 11111111    BADSSN
5 00065432    BADSSN

Used data: 使用的数据:

ssns <- data.frame(num = c('12343210','23454321','34565432','11111111','00065432'), stringsAsFactors = FALSE)
badssns <- c('11111111','22222222')

It seems like you have some experience with computer programming, but maybe are new to R. In most cases, the best R programs don't use for loops. 您似乎有一定的计算机编程经验,但可能对R来说是新手。在大多数情况下,最好的R程序不for循环。

Here's a more R ish way to accomplish what you've described. 这里有一个更加R十岁上下的方式来完成你所描述的东西。 It will be much faster when ssns and badssns are long. ssnsbadssns较长时, badssns更快。

ssns<-c(12343210,23454321,34565432,11111111)
badssns<-c(11111111,22222222)

good.idxs <- is.na(match(ssns, badssns))
good.ssns <- ssns[good.idxs]

You might want to work with strings rather than numbers -- maybe you are concerned the letter "oh" was used in place of the number "zero". 您可能想使用字符串而不是数字-也许您担心字母“ oh”被用来代替数字“零”。 This approach works in that case as well. 在这种情况下,这种方法也适用。 Somewhat unexpectedly (for me, anyway), it even works when ssns is a vector of characters and badssns is a vector of number or vice versa! 出乎意料的是(无论如何对我而言),当ssns是字符的向量而badssns是数字的向量时,甚至反之亦然!

If ssns and badssns are character vectors: 如果ssnsbadssns是字符向量:

ssns<-c("12343210","23454321","34565432","11111111","00023456")
badssns<-c("11111111","22222222")

then you can use just one ifelse : 那么您只能使用一个ifelse

result <- ifelse(ssns %in% badssns | grepl("^0{3}",ssns), "BADSSNS", "GOODSSNS")
##[1] "GOODSSNS" "GOODSSNS" "GOODSSNS" "BADSSNS"  "BADSSNS"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM