简体   繁体   English

Powershell:比较基于2列的2个文件

[英]Powershell: Comparing 2 files based on 2 columns together

Good Morning from Germany and sorry for my bad English. 来自德国的早安,对于我的英语不好对不起。

I hope that someone can help me. 我希望有人能帮助我。

We had to compare 2 .xls or .csv documents with over 4000 lines. 我们必须将2个.xls或.csv文档与4000多个行进行比较。 Both documents have a Column E with the same delivery note number. 两个文件的E列均带有相同的交货单编号。 This delivery note number is not unique, the number can be uses multiple Times in Column E. Each delivery note number also has a number of Pieces in Column D. 该交货单编号不是唯一的,该编号可以在E列中使用多次。每个交货单编号在D列中也具有多个件。

If delivery note number and quantity match in both files, we can ignore and delete the line. 如果两个文件中的交货单编号和数量均匹配,我们可以忽略并删除该行。 Comparing two Files with over 4000 Lines is very costly, so i hope that comparing is possible with Powershell an Regular Expression. 比较具有4000行以上的两个文件非常昂贵,因此我希望可以使用Powershell比较正则表达式。

My Idea: Convert xls to csv and do the following: Read the Lines and use the Entry of column E and column D. Foreach Entry of Column E, check if this entry exists on the second file. 我的主意:将xls转换为csv,然后执行以下操作:阅读各行,并使用E列和D列的条目。对于E列的Foreach条目,检查第二个文件中是否存在该条目。 If the Entry exists, check if column D was the same as in file1. 如果该条目存在,请检查D列是否与file1中的相同。 If both Entrys match, remove or Copy Both Lines in Both Files. 如果两个条目都匹配,请删除或复制两个文件中的两行。

At least we have two documents with Entrys wich have no assignment. 至少我们有两个文档,其中条目没有分配。

Is this possible? 这可能吗?

With the PowerShell I can handle quite well, but with Regular Expression... :/ 使用PowerShell,我可以很好地处理,但是使用正则表达式...:/

Thanks in advance Daniel 在此先感谢Daniel

If you think of your two values as a composite primary key, it seems to work out. 如果您将两个值视为一个复合主键,则似乎可行。 You said first value in column E isn't necessarily unique. 您说E列中的第一个值不一定是唯一的。 Can you tell me if it IS always unique when combined with its quantity? 您能告诉我它的数量是否总是唯一的?

Regardless, I would recommend, to process this, just to get a unique list of (Col E, col D) combinations, you could even just take a "E,D" formatted string, so long as col E and col D dont contain commas. 无论如何,我建议处理此过程,只是为了获得(Col E,col D)组合的唯一列表,您甚至可以只使用“ E,D”格式的字符串,只要col E和col D不包含逗号。 After getting this unique value, put it in a hashtable with the formatted value as the key, and an array of files for that key as the value. 获得此唯一值后,将其放入哈希表中,并将格式化后的值作为键,并将该键的文件数组作为值。

Now you have a way to efficiently lookup what files exist for what Col E, Col D combination, so you should be able to handle your specific use cases as needed. 现在,您可以有效地查找Col E和Col D组合中存在的文件,因此您应该能够根据需要处理特定的用例。

4000 lines doesn't sound like a lot. 4000行听起来并不多。 Try this: assuming csv files are called "1.csv" and "2.csv" 尝试以下操作:假设csv文件分别称为“ 1.csv”和“ 2.csv”

add-content 3.csv (get-content 1.csv) 
add-content 3.csv (get-content 2.csv) 

import-csv -header A,B,C,D,E,F 3.csv  | 
    group E,D | 
    where { $_.count -eq 1 } | 
    foreach { $_.group } |
    export-csv 3.diff.csv -noTypeInformation

"3.diff.csv" will contain only unique records. “ 3.diff.csv”将仅包含唯一记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM