简体   繁体   English

如何将与data.frame单元格一起使用的函数应用于data.frame列

[英]How to apply a function that works with data.frame cells to data.frame columns

This question is the adaptation of a prior question I felt I asked in an unclear way. 这个问题是我认为我以不明确的方式提出的先前问题的改编。 I am checking whether columns V1 and V2 have common codes by row. 我正在检查列V1和V2是否按行有共同的代码。 Codes are separated by a forward slash "/". 代码由正斜杠“/”分隔。 The function below should take one cell from V1 and one cell from V2 on the same row and should transform them into vectors. 下面的函数应该从V1获取一个单元格,在同一行上从V2获取一个单元格,并将其转换为向量。 Each element of a vector is one code. 向量的每个元素都是一个代码。 Then the function should check whether the two vectors obtained have elements in common. 然后该函数应检查所获得的两个向量是否具有共同的元素。 These elements initially are 4-digit codes. 这些元素最初是4位代码。 If there is any 4-digit code that matches between the two vectors, the function should return 4. If there are no elements in common, the function should reduce the number of digits of each code and then check again. 如果在两个向量之间存在任何匹配的4位代码,则该函数应返回4.如果没有共同的元素,则该函数应减少每个代码的位数,然后再次检查。 Every time that the function reduces the number of digits, it also reduces the score it returns at the end. 每次该函数减少位数时,它也会减少最后返回的分数。 I would like the value returned by the function to be written in a column of my choice. 我希望函数返回的值写在我选择的列中。

This is my starting condition 这是我的起始条件

structure(list(ID = c(2630611040, 2696102020, 2696526020), V1 = c("7371/3728", 
"2834/2833/2836/5122/8731", "3533/3541/3545/5084"), V2 = c("7379", 
"3841", "3533/3532/3531/1389/8711")), .Names = c("ID", "V1", 
"V2"), class = "data.frame", row.names = c(NA, 3L))

         ID                       V1                       V2
1 2630611040                7371/3728                     7379
2 2696102020 2834/2833/2836/5122/8731                     3841
3 2696526020      3533/3541/3545/5084 3533/3532/3531/1389/8711

And I would like to get this 我想得到这个

          ID                       V1                       V2   V3
1 2630611040                7371/3728                     7379   3
2 2696102020 2834/2833/2836/5122/8731                     3841   0
3 2696526020      3533/3541/3545/5084 3533/3532/3531/1389/8711   4

My function is this 我的功能是这个

coderelat<-function(a, b){

a<-unique(as.integer(unlist(str_split(a, "/")))) #Transforming cells into vectors of codes
b<-unique(as.integer(unlist(str_split(b, "/"))))

a<-a[!is.na(a)]
b<-b[!is.na(b)]

if (length(a)==0 | length(b)==0) { # Check that both cells are not empty

  ir=NA     
  return(ir)

  } else {


for (i in 3:1){

    diff<-intersect(a, b) # See how many products the shops have in common

            if (length(diff)!=0) { #As you find a commonality, give ir the corresponding scoring

              ir=i+1
              break

            } else if (i==1 & length(diff)==0) { #If in the last cycle, there is still no commonality put ir=0

              ir=0
              break

            } else { # If there is no commonality and you are not in the last cycle, reduce the nr. of digits and re-check commonality again

              a<- unique(as.integer(substr(as.character(a), 1, i)))
              b<- unique(as.integer(substr(as.character(b), 1, i)))

        }

    }     
  }
return(ir)
}

The function works when I manually supply single cells. 当我手动提供单个单元格时,该功能有效。 But it doesn't work when I write soemthing like this: 但是当我写这样的东西时,它不起作用:

df$V4<-coderelat(df$V1, df$V2)

I really appreciate any help because I don't know anymore how to make this work. 我非常感谢任何帮助,因为我不知道如何使这项工作。

Many thanks in advance. 提前谢谢了。 Riccardo 里卡多

Here's a solution using data.tables. 这是使用data.tables的解决方案。

get.match <-function(a,b) {
  A <- unique(strsplit(a,"/",fixed=TRUE)[[1]])
  B <- unique(strsplit(b,"/",fixed=TRUE)[[1]])
  for (i in 4:1) if(length(intersect(substr(A,1,i),substr(B,1,i)))>0) return(i)
  return(0L)
}
library(data.table)
setDT(df)[,V3:=get.match(V1,V2),by=ID]
df
#            ID                       V1                       V2 V3
# 1: 2630611040                7371/3728                     7379  3
# 2: 2696102020 2834/2833/2836/5122/8731                     3841  0
# 3: 2696526020      3533/3541/3545/5084 3533/3532/3531/1389/8711  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM