[英]Printing numerous specific lines from file using awk or sed command loop
I've got this big txt file with ID names. 我有一个带有ID名称的大型txt文件。 It has 2500 lines, one column.
它有2500行,一列。 Let's call it file.txt
我们称它为file.txt
H3430
H3467
H9805
Also, I've got another file, index.txt, which has 390 numbers: 另外,我还有另一个文件index.txt,其中包含390个数字:
1
4
9
13
15
Those numbers are the number of lines (of IDs) I have to extract from file.txt. 这些数字是我必须从file.txt中提取的(ID的)行数。 I need to generate another file, newfile.txt let's call it, with only the 390 IDs that are in the specific lines that index.txt demands (the first ID of the list, the fourth, the ninth, and so on).
我需要生成另一个文件,称为newfile.txt,它只有index.txt要求的特定行中的390个ID(列表的第一个ID,第四个,第九个,依此类推)。
So, I tried to do the following loop, but it didn't work. 因此,我尝试执行以下循环,但没有成功。
num=$'index.txt'
for i in num
do
awk 'NR==i' "file.txt" > newfile.txt
done
I'm a noob regarding this matters... so, I need some help. 对于这个问题,我是个菜鸟。所以,我需要一些帮助。 Even if it is with my loop or with a new solution suggested by you.
即使是在我的循环或您建议的新解决方案中。 Thank you :)
谢谢 :)
Lets create an example file that simulates your 2500 line file with seq
: 让我们创建一个示例文件,使用
seq
模拟您的2500行文件:
$ seq 2500 > /tmp/2500
And use the example you have for the line numbers to print in a file called 390: 并使用示例将行号打印到名为390的文件中:
$ echo "1
4
9
13
15" > /tmp/390
You can print line N in the file 2500 by reading the line numbers into an array and printing the lines if in that array: 您可以通过将行号读取到数组中并在该数组中打印行来在文件2500中打印N行:
$ awk 'NR==FNR{ a[$1]++; next} a[FNR]' /tmp/390 /tmp/2500
You can also use a sed
command file: 您还可以使用
sed
命令文件:
$ sed 's/$/p/' /tmp/390 > /tmp/sed_cmd
$ sed -n -f /tmp/sed_cmd /tmp/2500
With GNU sed, you can do sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500
使用GNU sed,您可以
sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500
sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500
but that does not work on OS X :-( sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500
但在OS X上不起作用:-(
You can do this tho: 您可以这样做:
$ sed -n -f <(sed 's/$/p/' /tmp/390) /tmp/2500
You can read the index.txt file in to a map and then compare it with the line number of file.txt. 您可以将index.txt文件读入地图,然后将其与file.txt的行号进行比较。 Redirect the output to another file.
将输出重定向到另一个文件。
awk 'NR==FNR{line[$1]; next}(FNR in line){print $1}' index.txt file.txt > newfile.txt
When you work with two files, using FNR is necessary as it gets reset to 1 when new file starts (on the contrary NR will continue to increment). 当您使用两个文件时,必须使用FNR,因为新文件启动时它将重置为1(相反,NR将继续增加)。
As Ed Morton suggests in the comments. 正如Ed Morton在评论中建议的那样。 The command could then be refined to further remove
{print $1}
since awk prints by default on truth. 然后可以对命令进行优化,以进一步删除
{print $1}
因为awk默认情况下会根据真相进行打印。
awk 'NR==FNR{line[$1]; next} FNR in line' index.txt file.txt > newfile.txt
If index.txt
is sorted, we could walk file.txt
in order. 如果
index.txt
已排序,我们可以按顺序遍历file.txt
。
That reduces the number of actions to the very minimum (faster script): 这样可以将操作数量减少到最少(更快的脚本):
awk 'BEGIN
{ indexfile="index.txt"
if ( (getline ind < indexfile) <= 0)
{ printf("Empty %s\n; exiting",indexfile);exit }
}
{ if ( FNR < ind ) next
if ( FNR == ind ) printf("%s %s\n",ind,$0)
if ( (getline ind < indexfile) <= 0) {exit}
}' file.txt
If the file is not actually sorted, get it quickly sorted with sort: 如果文件未真正排序,请使用sort快速排序:
sort -n index.txt > temp.index.txt
rm index.txt
mv temp.index.txt index.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.