简体   繁体   English

Linux 命令从 CSV 个文件中获取字段

[英]Linux Command to get fields from CSV files

In csv files on Linux server, I have thousands of rows in below csv format在 csv 服务器上的 Linux 文件中,我有数千行以下 csv 格式

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )我需要从以下格式的所有文件中获取 output(第 2 个字段即 20221208195546466 和第 5 个字段,但在Above as:并且在第一个 | 之前,即上面示例中的 2)

output: output:

20221208195546466 , 2

Can anyone help me with linux command?谁能帮我 linux 命令?

Edit:编辑:

my attempts我的尝试

I tried but it give field 5th value.我试过了,但它给出了第 5 个值。 How to add field 2 as well?如何添加字段 2?

cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|

EDIT: sorted result编辑:排序结果

Now I am using this command (based on Dave Pritlove answer ) awk -F'[,|:]' '{print $2", "$6}' file.csv.现在我正在使用此命令(基于 Dave Pritlove 的回答)awk -F'[,|:]' '{print $2", "$6}' file.csv。 However, I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it?但是,我还有一个问题,如果我必须根据 6 美元(您的示例中的值为 2)对 output 进行排序,那么我该怎么做? I want result should be displayed in sorted order based on 2nd output field.我希望结果应根据第 2 个 output 字段按排序顺序显示。 for ex:例如:

20221208195546366, 20 20221208195546366, 20

20221208195546436, 16 20221208195546436, 16

20221208195546466, 5 20221208195546466, 5

2022120819536466, 2 2022120819536466, 2

Gnu awk allows multiple field separators to be set, allowing you to delimit each record at , , | Gnu awk允许设置多个字段分隔符,允许您在, , |分隔每条记录, and : at the same time. , 和:同时。 Thus, the following will fish out the required fields from file.csv :因此,以下将从file.csv中找出所需的字段:

awk -F'[,|:]' '{print $2", "$6}' file.csv

Tested on the single record example:在单个记录示例上测试:

echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'

output: output:

20221208195546466, 2

Assumptions:假设:

  • starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)第 5 个逗号分隔字段的起始字符串可能因行而异(即,事先不知道)
  • the item of interest in the 5th comma-delimited field occurs between the first : and the first |第 5 个逗号分隔字段中感兴趣的项目出现在第一个:和第一个|之间。

Sample data:样本数据:

$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|

One awk approach:一个awk方法:

awk '
BEGIN { FS=OFS="," }                    # define input/output field delimiter as ","
      { split($5,a,"[:|]")              # split 5th field on dual delimiters ":" and "|", store results in array a[]
        print $2,a[2]                   # print desired items to stdout
      }
' test.csv

This generates:这会产生:

20221208195546466,2
20230124123456789,7

You can use awk for this:您可以为此使用 awk:

awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'

Probably need to adopt regexp a bit.可能需要稍微采用正则表达式。

You might change : to , and |您可以将:更改为,| to , then extract 2nd and 6th field using cut following way, let file.txt content be to ,然后使用cut以下方式提取第2和第6个字段,让file.txt内容为

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

then然后

tr ':|' ',,' < file.txt | cut --delimiter=',' --output-delimiter=' , ' --fields=2,6

gives output给出 output

20221208195546466 , 2

Explanation: tr translates ie replace : using , and replace |解释: tr翻译即替换:使用,替换| using , then I inform cut that delimiter in input is , output delimiter is, encased in spaces (as stipulated by your desired output) and want 2th and 6th column (not 5th, as it is now Above as )使用,然后我通知cut输入中,定界符是 output 定界符被包裹在空格中(根据您想要的输出规定)并且想要第 2 列和第 6 列(不是第 5 列,因为它现在Above as

(tested using GNU coreutils 8.30) (使用 GNU coreutils 8.30 测试)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM