Linux 命令从 CSV 个文件中获取字段

Question

In csv files on Linux server, I have thousands of rows in below csv format在 csv 服务器上的 Linux 文件中，我有数千行以下 csv 格式

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )我需要从以下格式的所有文件中获取 output（第 2 个字段即 20221208195546466 和第 5 个字段，但在Above as:并且在第一个 | 之前，即上面示例中的 2）

output: output：

20221208195546466 , 2

Can anyone help me with linux command?谁能帮我 linux 命令？

Edit:编辑：

my attempts我的尝试

I tried but it give field 5th value.我试过了，但它给出了第 5 个值。 How to add field 2 as well?如何添加字段 2？

cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|

EDIT: sorted result编辑：排序结果

Now I am using this command (based on Dave Pritlove answer ) awk -F'[,|:]' '{print $2", "$6}' file.csv.现在我正在使用此命令（基于 Dave Pritlove 的回答）awk -F'[,|:]' '{print $2", "$6}' file.csv。 However, I have one more query, If I have to sort the output based on $6 ( value 2 in your example ) then how can i do it?但是，我还有一个问题，如果我必须根据 6 美元（您的示例中的值为 2）对 output 进行排序，那么我该怎么做？ I want result should be displayed in sorted order based on 2nd output field.我希望结果应根据第 2 个 output 字段按排序顺序显示。 for ex:例如：

20221208195546366, 20 20221208195546366, 20

20221208195546436, 16 20221208195546436, 16

20221208195546466, 5 20221208195546466, 5

2022120819536466, 2 2022120819536466, 2

Answer 1

Gnu awk allows multiple field separators to be set, allowing you to delimit each record at , , | Gnu awk允许设置多个字段分隔符，允许您在, , |分隔每条记录, and : at the same time. , 和:同时。 Thus, the following will fish out the required fields from file.csv :因此，以下将从file.csv中找出所需的字段：

awk -F'[,|:]' '{print $2", "$6}' file.csv

Tested on the single record example:在单个记录示例上测试：

echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'

output: output：

20221208195546466, 2

Answer 2

Assumptions:假设：

starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)第 5 个逗号分隔字段的起始字符串可能因行而异（即，事先不知道）
the item of interest in the 5th comma-delimited field occurs between the first : and the first |第 5 个逗号分隔字段中感兴趣的项目出现在第一个:和第一个|之间。

Sample data:样本数据：

$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|

One awk approach:一个awk方法：

awk '
BEGIN { FS=OFS="," }                    # define input/output field delimiter as ","
      { split($5,a,"[:|]")              # split 5th field on dual delimiters ":" and "|", store results in array a[]
        print $2,a[2]                   # print desired items to stdout
      }
' test.csv

This generates:这会产生：

20221208195546466,2
20230124123456789,7

Answer 3

You can use awk for this:您可以为此使用 awk：

awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'

Probably need to adopt regexp a bit.可能需要稍微采用正则表达式。

Answer 4

You might change : to , and |您可以将:更改为,和| to , then extract 2nd and 6th field using cut following way, let file.txt content be to ,然后使用cut以下方式提取第2和第6个字段，让file.txt内容为

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

then然后

tr ':|' ',,' < file.txt | cut --delimiter=',' --output-delimiter=' , ' --fields=2,6

gives output给出 output

20221208195546466 , 2

Explanation: tr translates ie replace : using , and replace |解释： tr翻译即替换:使用,替换| using , then I inform cut that delimiter in input is , output delimiter is, encased in spaces (as stipulated by your desired output) and want 2th and 6th column (not 5th, as it is now Above as )使用,然后我通知cut输入中,定界符是 output 定界符被包裹在空格中（根据您想要的输出规定）并且想要第 2 列和第 6 列（不是第 5 列，因为它现在Above as ）

(tested using GNU coreutils 8.30) （使用 GNU coreutils 8.30 测试）

Linux 命令从 CSV 个文件中获取字段

问题描述

4 个解决方案

解决方案1
1 已采纳 2022-12-08 17:08:55

解决方案2
1 2022-12-08 17:09:09

解决方案3
0 2022-12-08 16:38:27

解决方案4
0 2022-12-09 09:43:58

Linux 命令从 CSV 个文件中获取字段

问题描述

4 个解决方案

解决方案1 1 已采纳 2022-12-08 17:08:55

解决方案2 1 2022-12-08 17:09:09

解决方案3 0 2022-12-08 16:38:27

解决方案4 0 2022-12-09 09:43:58

解决方案1
1 已采纳 2022-12-08 17:08:55

解决方案2
1 2022-12-08 17:09:09

解决方案3
0 2022-12-08 16:38:27

解决方案4
0 2022-12-09 09:43:58