根据查询ID的列和列表从固定格式的无空格文件中提取行

Question

I have a quite large fixed format file without spaces (file1):我有一个非常大的没有空格的固定格式文件 (file1)：

file1:文件1：

0808563800555550000367120000500000
0005555566369330000078020000500000
01066666780000000008933600009000005635
0904251263088000000786590056500000
0000469011009904440425120444444440

I want to extract lines with fields 4-8,11-15 and 20-24 when fields 4-8 (only) are in a list of IDs in file2当字段 4-8（仅）在 file2 的 ID 列表中时，我想提取字段 4-8、11-15 和 20-24 的行

file2:文件2：

55555
42512

The desired outputs are:所需的输出是：

55555 36933 07802
42512 08800 78659

I have tried the following combination of cut | grep我尝试了以下cut | grep组合cut | grep commands: cut | grep命令：

cut -c 4-8,11-15,20-24 file1 --output-delimiter=' ' | grep -w -F -f file2

It works fine and the speed is very good, but the problem is that I am getting columns where the lookup ID (fields 4-8) is not in the first column of the cutted data, and that is because grep checks the three columns after cut, not only the first one.它工作正常并且速度非常好，但问题是我得到的列的查找 ID（字段 4-8）不在切割数据的第一列中，这是因为 grep 检查了之后的三列切，不仅是第一个。

Here are the outputs of the command above:以下是上述命令的输出：

85638 55555 36712
55555 36933 07802
66666 00000 89336
42512 08800 78659
04690 00990 42512

I know one may write the output to a file and then use, for example awk, but I thought there could be a much simpler approach to avoid longer processing time (for example, makes grep picks only the match in a specific cutted column).我知道有人可能会将输出写入文件然后使用，例如 awk，但我认为可能有一种更简单的方法来避免更长的处理时间（例如，让 grep 仅选择特定剪切列中的匹配项）。

Any help will be very appreciated and many thanks!任何帮助将不胜感激，非常感谢！

Answer 1

Would you please try the following:请您尝试以下操作：

cut -c 4-8,11-15,20-24 file1 --output-delimiter=' ' | grep -wf <(sed 's/^/^/' file2)

Each line in file2 is prepended by a caret ^ character to anchor to the start of the line of the output by cut . file2中的每一行前面都有一个脱字符^字符，以锚定到cut输出行的开头。
It may be a bit slower than before due to the lack of -F option.由于缺少-F选项，它可能比以前慢一点。

Answer 2

With GNU awk for FIELDWIDTHS :使用FIELDWIDTHS的 GNU awk：

$ awk -v FIELDWIDTHS='3 5 2 5 4 5 *' 'NR==FNR{a[$0]; next} $2 in a{ print $2, $4, $6 }' file2 file1
55555 36933 07802
42512 08800 78659

根据查询ID的列和列表从固定格式的无空格文件中提取行

问题描述

2 个解决方案

解决方案1
0 2022-12-19 07:48:03

解决方案2
0 2022-12-19 18:45:44

根据查询ID的列和列表从固定格式的无空格文件中提取行

问题描述

2 个解决方案

解决方案1 0 2022-12-19 07:48:03

解决方案2 0 2022-12-19 18:45:44

解决方案1
0 2022-12-19 07:48:03

解决方案2
0 2022-12-19 18:45:44