简体   繁体   中英

R sapply loop to replace for loop

I have successfully switched for loops to sapply loops before, and I know for a fact (system.time()) that they are faster.

BUT my mind still works in a for loop way...

Please help me to convert this for loop case:

names.list <- c("Anna", "Ana", "Albert", "Albort", "Rob", "Robb", "Tommy", "Tommie")
misspell.list <- c("Anna", "Albort", "Robb", "Tommie")
fix.list <- c("Ana", "Albert", "Rob", "Tommy")

for(i in 1:length(fix.list)) {
        names.list[which(names.list == misspell.list[i])] <- fix.list[i]

}

names.list

To a sapply()

So far, I got:

sapply(seq_along(fix.list), function(x)
        names.list[which(names.list == misspell.list[x])]  <- fix.list[x]
)

But it only returns me the original vector.

Thanks!

EDIT 1:

the misspell.list and fix.list were created automatically by adist() bellow and the original names.list has 665 elements. My for() solution returns length(unique(names.list)) = 653 elements

# will do another sapply() substitution here soon
for(i in 1:(length(names.list)-1)) {
        distancias[i] <- adist(names.list[i], names.list[i+1])
}

# fix list
misspell.list <- names.list[which(distancias < 2)]
fix.list <- names.list[which(distancias < 2) +1]

EDIT 2: thanks to you, now I'm a sapply overlord and I'm here just to show my other for-sapply substitution used with adist()

nomes <- sort(unique(names.list))
distancias <- rep(10, length(nomes))

#adist() for finding misspelling
sapply(seq_along(nomes), 
       function(x) {
                if(x<length(nomes)) {
                        distancias[x] <<- adist(nomes[x], nomes[x+1])
                        }
        }
       )
# fix list
misspell.list <- names.list[which(distancias < 2)]
fix.list <- names.list[which(distancias < 2) +1]

The other part you already know, thanks again!

If there is one-to-one correspondence between misspell.list and fix.list you can do away with loops by using match function

names.list[match(misspell.list,names.list)] <- fix.list

names.list
#[1] "Ana"    "Ana"    "Albert" "Albert" "Rob"    "Rob"    "Tommy"  "Tommy"

The solution using match is much better, but in terms of what you were trying to do, this will work. Firstly, you don't need the which . You also need to use the <<- operator to tell the internal function defined within the loop to use the global environment rather than its own local one - otherwise it does not change names.list , only its copy.

sapply(seq_along(fix.list), function(x)
  names.list[names.list == misspell.list[x]]  <<- fix.list[x]
)

names.list
[1] "Ana"    "Ana"    "Albert" "Albert" "Rob"    "Rob"    "Tommy"  "Tommy" 

I would propose a small change to your whole setup. When using indexes like you do, you need to be sure that the order is always the same. If you add or remove a name, the whole thing falls apart.

Using a named list and lapply or sapply , your code stays dynamic and you can potentially match multiple misspellings to one name.

misspell.list  <-  list(
  'Anna' = 'Ana',
  'Albort' = 'Albert',
  'Robb' = 'Rob',
  'Tommie' = 'Tommy'
)

names.list <- c("Anna", "Ana", "Albert", "Albort", "Rob", "Robb", "Tommy", "Tommie")


> sapply(names.list,function(x) ifelse(x %in% names(misspell.list),misspell.list[[x]],x))
    Anna      Ana   Albert   Albort      Rob     Robb    Tommy   Tommie 
   "Ana"    "Ana" "Albert" "Albert"    "Rob"    "Rob"  "Tommy"  "Tommy" 

To illustrate what I mean, I'm using sample to shuffle up your names.list vector and extend it to 20 names. This shows that order and length have no influence.

sapply(names.list[sample(1:length(names.list),20,replace = T)],function(x) ifelse(x %in% names(misspell.list),misspell.list[[x]],x))
  Albert   Tommie      Rob   Tommie      Rob    Tommy      Ana     Robb   Tommie      Ana   Tommie   Albort      Ana   Albert   Albert   Albort 
"Albert"  "Tommy"    "Rob"  "Tommy"    "Rob"  "Tommy"    "Ana"    "Rob"  "Tommy"    "Ana"  "Tommy" "Albert"    "Ana" "Albert" "Albert" "Albert" 
   Tommy    Tommy    Tommy      Ana 
 "Tommy"  "Tommy"  "Tommy"    "Ana" 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM