将txt文件与csv bash中的第三列进行比较

Question

I am very new to programming and decided to learn bash as we deal with some log servers that are Linux/Unix based and so scripting is a bit easier. 我是编程新手并决定学习bash，因为我们处理一些基于Linux / Unix的日志服务器，因此脚本编写更容易一些。

I have a cvs file that is laid out as follows: 我有一个cvs文件，其布局如下：

PC,user,file,path - all comma separated. PC，用户，文件，路径 - 所有逗号分隔。

I have a white list of file names that are line separated. 我有一个行分隔的文件名白名单。 Some include spaces. 有些包括空间。

My goal is to compare the whitelist to column 3 of the csv file and output all lines that don't match. 我的目标是将白名单与csv文件的第3列进行比较，并输出所有不匹配的行。 I have tried a while read loop with an if statement but cannot seem to get it to work. 我尝试了一个带有if语句的while循环，但似乎无法让它工作。 I have done a few awk one liners and actually got one from a past stackoverflow post that outputted the lines that matched the whitelist but I cannot seem to figure out how to reverse to the logic to get it to work. 我做了一些awk一个衬垫，实际上从过去的stackoverflow帖子中得到一个输出与白名单相匹配的行，但我似乎无法弄清楚如何反转逻辑以使其工作。 Code is below. 代码如下。

awk     'BEGIN{i=0}
       FNR==NR { a[i++]=$1; next }
        { for(j=0; j<i; j++)
            if(index($0,a[j]))
                {print $0;break}
        }' $whitelist $exestartup

I would like to stick to basic bash with no add-ons and not opposed to doing a loop/if statement instead of an awk one liner. 我想坚持使用没有附加组件的基本bash，而不是反对做一个循环/ if语句而不是一个awk单行。

Sample input/output: 样本输入/输出：

whitelist.txt whitelist.txt

program.exe Program.exe文件
super program.exe 超级program.exe
possible-program.exe 可能-的Program.exe

exestartup.csv exestartup.csv

Asset1,user1,potato.exe,c:\\users\\user1 Asset1，用户1，potato.exe，C：\\用户\\ USER1
Asset2,user2,program.exe,c:\\users\\user2 Asset2，用户2，Program.exe文件C：\\用户\\用户2
Asset3,user3,possible-program.exe,c:\\users\\user3 Asset3，用户3，可能-的Program.exe，C：\\用户\\用户3
Asset4,user4,super program.exe,c:\\users\\user4 Asset4，user4，super program.exe，c：\\ users \\ user4

Output 产量

Asset1,user1,potato.exe,c:\\users\\user1 Asset1，用户1，potato.exe，C：\\用户\\ USER1

Answer 1

awk to the rescue! awk来救援！

awk -F, 'FNR==NR{a[$1]; next} !($3 in a)' whitelist exestartup

set the field delimiter to comma. 将字段分隔符设置为逗号。 Load all whitelist names and compare against $3 fields of the file, if not match; 加载所有白名单名称，并与文件的$ 3字段进行比较，如果不匹配; print. 打印。

If you post sample input and expected output you'll get more answers and perhaps better suggestions. 如果您发布样本输入和预期输出，您将获得更多答案，也许更好的建议。

using your input files 使用您的输入文件

$ awk -F, 'FNR==NR{a[$1]; next} !($3 in a)' whitelist.txt exestartup.csv

Asset1,user1,potato.exe,c:\users\user1

if your awk is broken and the field values are disjoint you can revert to grep 如果您的awk被破坏且字段值不相交，您可以恢复为grep

$ grep -vf whitelist.txt exestartup.csv

Asset1,user1,potato.exe,c:\users\user1

Answer 2

Using join : 使用join ：

$ join -v 1 -t, -1 3 -2 1 -o 1.1,1.2,1.3,1.4 <(sort -t, -k3,3 exestartup.csv) <(sort whitelist.txt)
Asset1,user1,potato.exe,c:\users\user1

If the input files are already sorted on the matching key (they don't appear to be in your example), that could simply be: 如果输入文件已经在匹配的键上排序（它们似乎不在您的示例中），那可能只是：

$ join -v 1 -t, -1 3 -2 1 -o 1.1,1.2,1.3,1.4 exestartup.csv whitelist.txt

Answer 3

This solution uses only Bash 3 builtins: 此解决方案仅使用Bash 3内置：

IFS=$'\n' read -d '' -r -a whitefiles < whitelist.txt

while IFS= read -r csvline || [[ -n $csvline ]] ; do
    IFS=, read pc user file path <<< "$csvline"
    for wfile in "${whitefiles[@]}" ; do
        [[ $wfile == "$file" ]] && continue 2
    done
    printf '%s\n' "$csvline"
done < exestartup.csv

A much faster and cleaner solution can be implemented in Bash 4 because it's got associative arrays. 可以在Bash 4中实现更快更清洁的解决方案，因为它具有关联数组。

将txt文件与csv bash中的第三列进行比较

问题描述

3 个解决方案

解决方案1
5 已采纳 2016-04-04 18:56:14

解决方案2
0 2016-04-04 19:57:42

解决方案3
0 2016-04-04 20:01:54

将txt文件与csv bash中的第三列进行比较

问题描述

3 个解决方案

解决方案1 5 已采纳 2016-04-04 18:56:14

解决方案2 0 2016-04-04 19:57:42

解决方案3 0 2016-04-04 20:01:54

解决方案1
5 已采纳 2016-04-04 18:56:14

解决方案2
0 2016-04-04 19:57:42

解决方案3
0 2016-04-04 20:01:54