[英]Extract lines from a file in bash
I have a file like this 我有这样的文件
I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. 我想将带有0和1(文件中的所有行)的行提取到一个单独的文件中。 However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). 但是,序列不必以0开头,但也可以从1开始。但是,该行总是直接在行之后(SITE :)。 Moreover, I would like to extract the line SITTE itself into a seperate file. 此外,我想将SITTE行本身提取为一个单独的文件。 Could somebody tell me how that is doable in bash? 有人能告诉我在bash中这是可行的吗?
You could try something like : 你可以尝试类似的东西:
$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE: 0 0.000340988542 0.0357651018
SITE: 1 0.000529755514 0.00324293642
SITE: 2 0.000577745511 0.052214098
Another solution, using bash : 使用bash的另一个解决方案:
$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY"; done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
To remove the characters 0
at beginning of the line : 要删除行开头的字符0
:
$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
UPDATE : New file format provided in comments : 更新:评论中提供的新文件格式:
$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE: 967 0.189021866 0.0169990123
SITE: 968 0.189149593 0.246619149
SITE: 969 0.189172266 6.84752689e-05
Moreover, I would like to extract the line SITTE itself into a seperate file. 此外,我想将SITTE行本身提取为一个单独的文件。
That's the easy part: 这很简单:
grep '^SITE:' infile > outfile.site
Extracting the line after that is slightly harder: 提取后 ,该行是稍硬:
grep --after-context=1 '^SITE:' infile \
| grep '^[01]*$' \
> outfile.nr
--after-context
(or -A
) specifies how many lines after the matching line to print as well. --after-context
(或-A
)指定匹配行之后的行数如何打印为好。 We then use the second grep
to print only that line, and not the actually matching line (nor the delimiter which grep
puts between each matching entry when specifying an after-context
). 然后我们使用第二个grep
来仅打印该行,而不是实际匹配的行(也不是指定after-context
时grep
在每个匹配条目之间放置的分隔符)。
Alternatively, you could use the following to match the numeric lines: 或者,您可以使用以下内容来匹配数字行:
grep '^[01]*$' infile > outfile.nr
That's much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE:
. 这样更容易,但它会发现所有的行都只包含0和1,无论它们是否来自以SITE:
开头的行SITE:
。
Here's a simple awk
solution that matches all lines starting with SITE:
and outputs the respective next line: 这是一个简单的awk
解决方案,匹配以SITE:
开头的所有行SITE:
并输出相应的下一行:
awk '/^SITE:/ { if (getline) print }' infile > outfile
Simply omit the { ... }
block part to extract all lines starting with SITE:
themselves to a separate file: 简单地忽略{ ... }
块部分提取所有行开始SITE:
自己独立的文件:
awk '/^SITE:/' infile > outfile
If you wanted to combine both operations: 如果您想要结合这两个操作:
outfile1
and outfile2
are the names of the 2 output files, passed to awk
as variables f1
and f2
: outfile1
和outfile2
是2个输出文件的名称,作为变量f1
和f2
传递给awk
:
awk -v f1=outfile1 -v f2=outfile2 \
'/^SITE:/ { print > f1; if (getline) print > f2 }' infile
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.