简体   繁体   English

使用awk或sed命令循环从文件中打印大量特定行

[英]Printing numerous specific lines from file using awk or sed command loop

I've got this big txt file with ID names. 我有一个带有ID名称的大型txt文件。 It has 2500 lines, one column. 它有2500行,一列。 Let's call it file.txt 我们称它为file.txt

H3430
H3467
H9805

Also, I've got another file, index.txt, which has 390 numbers: 另外,我还有另一个文件index.txt,其中包含390个数字:

1
4
9
13
15

Those numbers are the number of lines (of IDs) I have to extract from file.txt. 这些数字是我必须从file.txt中提取的(ID的)行数。 I need to generate another file, newfile.txt let's call it, with only the 390 IDs that are in the specific lines that index.txt demands (the first ID of the list, the fourth, the ninth, and so on). 我需要生成另一个文件,称为newfile.txt,它只有index.txt要求的特定行中的390个ID(列表的第一个ID,第四个,第九个,依此类推)。

So, I tried to do the following loop, but it didn't work. 因此,我尝试执行以下循环,但没有成功。

num=$'index.txt'
for i in num
do
awk 'NR==i' "file.txt" > newfile.txt
done

I'm a noob regarding this matters... so, I need some help. 对于这个问题,我是个菜鸟。所以,我需要一些帮助。 Even if it is with my loop or with a new solution suggested by you. 即使是在我的循环或您建议的新解决方案中。 Thank you :) 谢谢 :)

Lets create an example file that simulates your 2500 line file with seq : 让我们创建一个示例文件,使用seq模拟您的2500行文件:

$ seq 2500 > /tmp/2500

And use the example you have for the line numbers to print in a file called 390: 并使用示例将行号打印到名为390的文件中:

$ echo "1
4
9
13
15" > /tmp/390

You can print line N in the file 2500 by reading the line numbers into an array and printing the lines if in that array: 您可以通过将行号读取到数组中并在该数组中打印行来在文件2500中打印N行:

$ awk 'NR==FNR{ a[$1]++; next} a[FNR]' /tmp/390 /tmp/2500

You can also use a sed command file: 您还可以使用sed命令文件:

$ sed 's/$/p/' /tmp/390 > /tmp/sed_cmd
$ sed -n -f /tmp/sed_cmd /tmp/2500

With GNU sed, you can do sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500 使用GNU sed,您可以sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500 sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500 but that does not work on OS X :-( sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500但在OS X上不起作用:-(

You can do this tho: 您可以这样做:

$ sed -n -f <(sed 's/$/p/' /tmp/390) /tmp/2500

You can read the index.txt file in to a map and then compare it with the line number of file.txt. 您可以将index.txt文件读入地图,然后将其与file.txt的行号进行比较。 Redirect the output to another file. 将输出重定向到另一个文件。

awk 'NR==FNR{line[$1]; next}(FNR in line){print $1}' index.txt file.txt > newfile.txt

When you work with two files, using FNR is necessary as it gets reset to 1 when new file starts (on the contrary NR will continue to increment). 当您使用两个文件时,必须使用FNR,因为新文件启动时它将重置为1(相反,NR将继续增加)。

As Ed Morton suggests in the comments. 正如Ed Morton在评论中建议的那样。 The command could then be refined to further remove {print $1} since awk prints by default on truth. 然后可以对命令进行优化,以进一步删除{print $1}因为awk默认情况下会根据真相进行打印。

awk 'NR==FNR{line[$1]; next} FNR in line' index.txt file.txt > newfile.txt

If index.txt is sorted, we could walk file.txt in order. 如果index.txt已排序,我们可以按顺序遍历file.txt
That reduces the number of actions to the very minimum (faster script): 这样可以将操作数量减少到最少(更快的脚本):

awk 'BEGIN
     {  indexfile="index.txt"
        if ( (getline ind < indexfile) <= 0)
             { printf("Empty %s\n; exiting",indexfile);exit }
     }
     {  if ( FNR <  ind ) next
        if ( FNR == ind ) printf("%s %s\n",ind,$0)
        if ( (getline ind < indexfile) <= 0) {exit}
     }' file.txt

If the file is not actually sorted, get it quickly sorted with sort: 如果文件未真正排序,请使用sort快速排序:

sort -n index.txt > temp.index.txt
rm index.txt
mv temp.index.txt index.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM