如何基于公共列合并文件？

Question

有2个文件，file1是telnet命令的输出即ie

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]  
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ]

另一个是file2，就像

http://ip1:port1/, ZOOM1  
http://ip2:port2/, ZOOM2  
http://ip3:port3/, ZOOM3

我需要根据通用IP和端口合并这两个文件。 输出应该是第3个文件，如：

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]  ZOOM1  
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ]  ZOOM2

我尝试了join ，但是join会在我的shell中出错。 任何没有 join帮助都会非常有帮助。

我尝试join ，这作为一个命令行，但在shell脚本中失败，无论是在bash还是sh 。 而且它不匹配，它只是复制粘贴。

paste -d " : " file1 <(cut -s -d ',' -f2 file2)

我也尝试了awk命令，但它没有按预期处理文件。

awk 'NR==FNR {h[$2] = $3; next} {print $1,$2,$3,h[$2]}' file2 file1 > file3

Answer 1

使用join有点复杂，因为这两个文件有不同的分隔符，但是：

$ join -17 -21 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,1.11,1.12,2.2 <(sort -k7,7 a.txt) <(sort -k1,1 -t, b.txt | tr -d ',')
25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] ZOOM1
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] ZOOM2

如果文件已经根据URL sort则可以删除sort位，但仍需要从第二个文件中删除逗号。

Answer 2

#!/usr/bin/perl
my %z = split/[, \n]+/, qx(cat file2);  # read file2 into %z for lookups
my @file1 = split/\n/, qx(cat file1);   # read lines of file1 into @file1
for( @file1 ){                          # for each line of file1
  /http\S+/;                     # find the url, \S+ is non-space chars
  print "$_ $z{$&}\n";           # url in $& print the line and "the zoom" from %z
}

如果要从命令行获取文件名，请用$ARGV[0]和$ARGV[1]替换file1和file2 。 我不知道/usr/bin/parse和awk是否可以像你建议的那样在这种情况下工作。 会很有趣，看看如何。 在大多数情况下，Perl优于awk。

Answer 3

它可以用awk完成，但在两个文件中使用相同的分隔符会更容易。 因此，首先删除file2中的逗号：

sed -i.old 's/,//' file2

您可以处理：

awk '{

    if(FILENAME=="file1"){
        m[$7]=$0
    }
    else {
        if(m[$1]!=""){
           print m[$1],$2
        }
    }
}' file1 file2

它首先将file1的内容注册到地图中，其中http://...上的键和包含整行（ $0 ）的值。 然后它处理file2并显示如果file2的第二列对应于地图的键的预期。

在您的特定情况下，您可以通过以下方式完成所有操作：

awk -F'[ ,]' '{

    if(FILENAME=="file1"){
        m[$7]=$0
    }
    else {
        if(m[$1]!=""){
           print m[$1],$2
        }
    }
}' file1 file2

空格和逗号都被awk视为分隔符

Answer 4

将file2读入哈希值，然后一次处理一行file1 ，提取密钥并在哈希中查找。 像这样的东西：

#!/usr/bin/perl

use strict;
use warnings;

open my $fh2, '<', 'file2' or die $!;

my %data_hash = map { split /,/ } <$fh2>;

close $fh2;

open my $fh1, '<', 'file1' or die $!;

while (<$fh1>) {
  if (my ($key) = /\b(http:\S+)/) {
    if (exists $data_hash{$key}) {
      chomp;
      print "$_ $data_hash{$key}";
    } else {
      # Key doesn't exist in file2
      print;
    }
  } else {
    # No http key found on a line in file1
    print;
  }
}

Answer 5

请尝试以下方法：

awk 'NR==FNR {h[$1]=$2; next} {print $0" "h[$7]}' <(sed "s/,//" file2) file1

结果：

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] ZOOM1
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] ZOOM2

如何基于公共列合并文件？

问题描述

5 个解决方案

解决方案1
3 2019-09-04 09:05:58

解决方案2
2 2019-09-04 08:47:21

解决方案3
1 已采纳 2019-09-04 08:58:45

解决方案4
1 2019-09-04 09:05:08

解决方案5
1 2019-09-04 11:13:08

如何基于公共列合并文件？

问题描述

5 个解决方案

解决方案1 3 2019-09-04 09:05:58

解决方案2 2 2019-09-04 08:47:21

解决方案3 1 已采纳 2019-09-04 08:58:45

解决方案4 1 2019-09-04 09:05:08

解决方案5 1 2019-09-04 11:13:08

解决方案1
3 2019-09-04 09:05:58

解决方案2
2 2019-09-04 08:47:21

解决方案3
1 已采纳 2019-09-04 08:58:45

解决方案4
1 2019-09-04 09:05:08

解决方案5
1 2019-09-04 11:13:08