繁体   English   中英

awk比较两个文件中的列,如果在file1中看不到文件2列,则打印该列

[英]awk compare columns from two files and print the file 2 column if it is not seen in file1

例如。 文件1

"ACCOUNT_ID","CTN","NAME","GATEWAY_GUID","DEVICE_GUID","CATALOG_ID","FW_VERSION","DATE_CREATED","STATUS_ID","LOCATION_CODE","BAN","Market_Area","State","IMEI","HW_MODEL"
"306875",="9404653975","14-052917 14-052917","313A0B72E3E440DD8687BD681E55FB03","0",="000010000010004","FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5","06/24/2014 14:32:38","0",="0003013034",="177046772949","DLS","TX",="351612051721824","Cisco DLC-100"
"306875",="9404653975","14-052917 14-052917","7EED6EE61F0949EE99554D4D4F09E4FE","ACFF000001",="000010901000004","1.2.14","06/24/2014 21:28:17","0",="",="177046772949","DLS","TX",="351612051721824",""
"306875",="9404653975","14-052917 14-052917","D57DAE988A1C482EA3217312EDC7466E ","ACFF010904",="000010907000004","","12/16/2015 23:39:21","0",="",="177046772949","DLS","TX",="351612051721824",""

文件2

account,ban,ctn,first_name,last_name,device_gateway_guid,device_id,device_cat_id,IMEI,device_fw_vrsn,date_created,device_status,subscription_created,subscription_name,subscription_market,date
DL!813269 , 418069632891 , undefined , MUHAMMAD , ANJUM , 313A0B72E3E440DD8687BD681E55FB03, ACFF010904 , 00010907000004 , 351612054025777 ,  , 2015-12-18 19:45:31 , 0 , undefined , [object Object] , WAS , undefined
DL!782477 , 523266997720 , undefined , SAM , MAURER , 7EED6EE61F0949EE99554D4D4F09E4FE , 0 , 00010000010004 , 351612053801194 , FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5 , 2015-12-18 19:02:27 , 0 , undefined , [object Object] , FLP , 2015-07-29 09:07:22
DL!926875 , 578172109430 , undefined , TRACY , BUSH , D57DAE988A1C482EA3217312EDC7466E , 0 , 00010000010004 , 351612054481798 , FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5 , 2016-01-23 16:09:21 , 0 , undefined , [object Object] , GLF , 2015-11-06 02:26:31
"306875",="9404653975","14-052917 14-052917","313A0B72E3E440DD8687BD681E55FB03","0",="313A0B72E3E440DD8687BD681E55FB03","FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5","06/24/2014 14:32:38","0",="0003013034",="177046772949","DLS","TX",="351612051721824","Cisco DLC-100"

我想将file1的“ GATEWAY_GUID”列与file2的device_gateway_guid列进行比较。 输出应该是file2中所有未在file1中看到的记录。

例如:如果文件1有10条记录,文件2有5条记录中的5000条与文件1相同,那么我的输出文件应显示文件1中缺少的文件5的列值在文件2中看不到。

到目前为止,我尝试了以下脚本,但没有用。 任何帮助表示赞赏。

awk 'NR==FNR{c[$6]++;next};c[$4] == 0' s2_1.csv s1_1.csv > compares1s2.csv 

第一步:解析file1,这样您只有相关信息:

grep -v "GATEWAY_GUID" file1 | cut -d'"' -f8 
# or more difficult to read
sed -n '2,$ s/\([^,]*,\)\{3\}"\([^"]*\).*/\2/p' file1

根据您输入的可能格式,您可能需要更改此设置。
输出第一步

313A0B72E3E440DD8687BD681E55FB03
7EED6EE61F0949EE99554D4D4F09E4FE
D57DAE988A1C482EA3217312EDC7466E

现在你想做些类似的事情

grep -Ev "313A0B72E3E440DD8687BD681E55FB03|7EED6EE61F0949EE99554D4D4F09E4FE|D57DAE988A1C482EA3217312EDC7466E" file2

grep可以选择从文件中读取搜索字符串,因此可以将其更改为

grep -EvFxf tempfile_with_search_keys file2

您可以使用“进程替换”将临时文件保留在“内存中”:

grep -EvFxf <(some_command) file2

对于some_command,我可以使用第一个命令来解析file1

grep -EvFxf <(grep -v "GATEWAY_GUID" file1 | cut -d'"' -f8) file2

您需要这样的东西:

awk '
    BEGIN   { FS="[,\"]";}
    FNR==NR {  
      a[ $11 ] ++;
      next;
    }

    { tmp = gensub(/[ ]+/, "", "g", $6); 
      if( !( tmp in a)  ) print tmp; 
    }
' ff1.csv ff2.csv 
  • 假设文件名是ff1.csv和ff2.csv
  • 问题中提供的示例在第二个文件中有空格,有些行带有引号,有些行没有引号
  • 报价在FS的分配中处理
  • 第二个文件中的值中的空格将通过gensub删除
  • 在Linux上用GNU awk 4测试

一次性,但它适用于您的示例数据:

 awk -F, 'NR==FNR {
              s=$4; 
              gsub("\"", "", s); 
              gsub(/[[:space:]]/,"", s); 
              arr[s]++} 
          NR>FNR && FNR>1 {
              s=$6; 
              gsub("\"", "", s); 
              gsub(/[[:space:]]/,"", s); 
              sub(/=/,"",s); 
              if (!(s in arr))
                  print s;
              }' file1 file2

您没有发布期望的输出,因此无论是否需要idk都可以,但是无论如何您都可以在这里进行发布:

$ awk -F'[ "]*,[ "]*' 'NR==FNR{a[$4];next} (FNR==1) || ($6 in a)' file1 file2
account,ban,ctn,first_name,last_name,device_gateway_guid,device_id,device_cat_id,IMEI,device_fw_vrsn,date_created,device_status,subscription_created,subscription_name,subscription_market,date
DL!813269 , 418069632891 , undefined , MUHAMMAD , ANJUM , 313A0B72E3E440DD8687BD681E55FB03, ACFF010904 , 00010907000004 , 351612054025777 ,  , 2015-12-18 19:45:31 , 0 , undefined , [object Object] , WAS , undefined
DL!782477 , 523266997720 , undefined , SAM , MAURER , 7EED6EE61F0949EE99554D4D4F09E4FE , 0 , 00010000010004 , 351612053801194 , FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5 , 2015-12-18 19:02:27 , 0 , undefined , [object Object] , FLP , 2015-07-29 09:07:22
DL!926875 , 578172109430 , undefined , TRACY , BUSH , D57DAE988A1C482EA3217312EDC7466E , 0 , 00010000010004 , 351612054481798 , FW: 1.04.122, JVM: Oracle Corporation 1.7.0_72-ea, OS: Linux 2.6.33.5 , 2016-01-23 16:09:21 , 0 , undefined , [object Object] , GLF , 2015-11-06 02:26:31

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM