用数字序列替换文件之间的匹配

Question

I have two text files:我有两个文本文件：

FileA has three columns: FileA 有三列：

Col1 Col2 Col3  
111111 111111 0  
222222 222222 0  
333333 333333 0  
444444 444444 0  
666666 666666 0

FileB has one column (no header): FileB 有一列（无标题）：

I want to replace content in columns 1 and 2 of FileA if it matches content in FileB.如果与 FileB 中的内容匹配，我想替换 FileA 的第 1 列和第 2 列中的内容。 I want the replacement be a sequence of negative numbers, starting at -4.我希望替换是一系列负数，从 -4 开始。

Desired output:期望的输出：

Col1 Col2 Col3  
111111 111111 0  
-4 -4 0  
333333 333333 0  
-5 -5 0  
-6 -6 0

The actual length of FileA is ~500k and FileB is 80. FileA 的实际长度约为 500k，FileB 的实际长度为 80。

An R or bash solution would be much appreciated.非常感谢 R 或 bash 解决方案。

Answer 1

With base R you can do it like this.使用基础 R，您可以这样做。

FileA[] <- lapply(FileA, function(x){
  i <- match(FileB$Col1, x)
  if(all(!is.na(i))) x[i] <- -seq_along(i) - 3
  x
})

FileA
#    Col1   Col2 Col3
#1 111111 111111    0
#2     -4     -4    0
#3 333333 333333    0
#4     -5     -5    0
#5     -6     -6    0
#6     -7     -7    0

Data.数据。

FileA <- data.frame(Col1 = c(111111, 22222, 333333, 444444, 555555, 666666),
                    Col2 = c(111111, 22222, 333333, 444444, 555555, 666666),
                    Col3 = 0)
FileB <- data.frame(Col1 = c(22222, 444444, 555555, 666666))

Answer 2

This does the trick with a nested loop:这是使用嵌套循环的技巧：

equalities <- apply(filea, 2, function(x) x %in% fileb)
result <- filea
replacement <- c(-4:-99)

for( i in 1:ncol(result)) {
  result[,i] <- ifelse(equalities[,i], "toreplace", result[,i])
  nbmatches <- 1
  for( j in 1:nrow(result)) {
    if("toreplace"==result[j,i]) nbmatches <- nbmatches + 1
    result[j,i] <- ifelse("toreplace"==result[j,i], replacement[nbmatches],result[j,i])
  }

  }
result
    Col1   Col2 Col3
1 111111 111111    0
2     -5     -5    0
3 333333 333333    0
4     -6     -6    0
5     -7     -7    0

Answer 3

this assumes two columns have the same values这假设两列具有相同的值

$ awk -v c=-4 'NR==FNR {a[$1]; next} 
               $1 in a {$1=$2=c--}1' fileB fileA 

Col1 Col2 Col3  
111111 111111 0  
-4 -4 0
333333 333333 0  
-5 -5 0
-6 -6 0

Explanation save the first file values in array a .说明将第一个文件值保存在数组a 。 If the first field of fileB is in the array a replace first and second fields with the counter c and decrement the counter.如果 fileB 的第一个字段在数组a ，则用计数器c替换第一个和第二个字段并递减计数器。 Print all the lines (updated or not).打印所有行（更新与否）。

用数字序列替换文件之间的匹配

问题描述

3 个解决方案

解决方案1
1 2018-10-17 16:59:45

解决方案2
0 2018-10-17 17:05:42

解决方案3
0 已采纳 2018-10-17 17:18:17

用数字序列替换文件之间的匹配

问题描述

3 个解决方案

解决方案1 1 2018-10-17 16:59:45

解决方案2 0 2018-10-17 17:05:42

解决方案3 0 已采纳 2018-10-17 17:18:17

解决方案1
1 2018-10-17 16:59:45

解决方案2
0 2018-10-17 17:05:42

解决方案3
0 已采纳 2018-10-17 17:18:17