简体   繁体   English

从bash中的文件中提取行

[英]Extract lines from a file in bash

I have a file like this 我有这样的文件

I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. 我想将带有0和1(文件中的所有行)的行提取到一个单独的文件中。 However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). 但是,序列不必以0开头,但也可以从1开始。但是,该行总是直接在行之后(SITE :)。 Moreover, I would like to extract the line SITTE itself into a seperate file. 此外,我想将SITTE行本身提取为一个单独的文件。 Could somebody tell me how that is doable in bash? 有人能告诉我在bash中这是可行的吗?

You could try something like : 你可以尝试类似的东西:

$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE:   0    0.000340988542    0.0357651018
SITE:   1    0.000529755514   0.00324293642
SITE:   2    0.000577745511     0.052214098

Another solution, using bash : 使用bash的另一个解决方案:

$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY";  done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

To remove the characters 0 at beginning of the line : 要删除行开头的字符0

$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

UPDATE : New file format provided in comments : 更新:评论中提供的新文件格式:

$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE:   967         0.189021866    0.0169990123
SITE:   968         0.189149593     0.246619149
SITE:   969         0.189172266  6.84752689e-05

Moreover, I would like to extract the line SITTE itself into a seperate file. 此外,我想将SITTE行本身提取为一个单独的文件。

That's the easy part: 这很简单:

grep '^SITE:' infile > outfile.site

Extracting the line after that is slightly harder: 提取 ,该行是稍硬:

grep --after-context=1 '^SITE:' infile \
    | grep '^[01]*$' \
    > outfile.nr

--after-context (or -A ) specifies how many lines after the matching line to print as well. --after-context (或-A )指定匹配行之后的行数如何打印为好。 We then use the second grep to print only that line, and not the actually matching line (nor the delimiter which grep puts between each matching entry when specifying an after-context ). 然后我们使用第二个grep来仅打印该行,而不是实际匹配的行(也不是指定after-contextgrep在每个匹配条目之间放置的分隔符)。

Alternatively, you could use the following to match the numeric lines: 或者,您可以使用以下内容来匹配数字行:

grep '^[01]*$' infile > outfile.nr

That's much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE: . 这样更容易,但它会发现所有的行都只包含0和1,无论它们是否来自以SITE:开头的行SITE:

Here's a simple awk solution that matches all lines starting with SITE: and outputs the respective next line: 这是一个简单的awk解决方案,匹配以SITE:开头的所有行SITE:并输出相应的下一行:

awk '/^SITE:/ { if (getline) print }'  infile > outfile

Simply omit the { ... } block part to extract all lines starting with SITE: themselves to a separate file: 简单地忽略{ ... }块部分提取所有行开始SITE: 自己独立的文件:

awk '/^SITE:/' infile > outfile

If you wanted to combine both operations: 如果您想要结合这两个操作:

outfile1 and outfile2 are the names of the 2 output files, passed to awk as variables f1 and f2 : outfile1outfile2是2个输出文件的名称,作为变量f1f2传递给awk

awk -v f1=outfile1 -v f2=outfile2 \
  '/^SITE:/ { print > f1; if (getline) print > f2 }'  infile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM