从bash中的文件中提取行

Question

I have a file like this 我有这样的文件

I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. 我想将带有0和1（文件中的所有行）的行提取到一个单独的文件中。 However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). 但是，序列不必以0开头，但也可以从1开始。但是，该行总是直接在行之后（SITE :)。 Moreover, I would like to extract the line SITTE itself into a seperate file. 此外，我想将SITTE行本身提取为一个单独的文件。 Could somebody tell me how that is doable in bash? 有人能告诉我在bash中这是可行的吗？

Answer 1

You could try something like : 你可以尝试类似的东西：

$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE:   0    0.000340988542    0.0357651018
SITE:   1    0.000529755514   0.00324293642
SITE:   2    0.000577745511     0.052214098

Another solution, using bash : 使用bash的另一个解决方案：

$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY";  done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

To remove the characters 0 at beginning of the line : 要删除行开头的字符0 ：

$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000

UPDATE : New file format provided in comments : 更新：评论中提供的新文件格式：

$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE:   967         0.189021866    0.0169990123
SITE:   968         0.189149593     0.246619149
SITE:   969         0.189172266  6.84752689e-05

Answer 2

Moreover, I would like to extract the line SITTE itself into a seperate file. 此外，我想将SITTE行本身提取为一个单独的文件。

That's the easy part: 这很简单：

grep '^SITE:' infile > outfile.site

Extracting the line after that is slightly harder: 提取后，该行是稍硬：

grep --after-context=1 '^SITE:' infile \
    | grep '^[01]*$' \
    > outfile.nr

--after-context (or -A ) specifies how many lines after the matching line to print as well. --after-context （或-A ）指定匹配行之后的行数如何打印为好。 We then use the second grep to print only that line, and not the actually matching line (nor the delimiter which grep puts between each matching entry when specifying an after-context ). 然后我们使用第二个grep来仅打印该行，而不是实际匹配的行（也不是指定after-context时grep在每个匹配条目之间放置的分隔符）。

Alternatively, you could use the following to match the numeric lines: 或者，您可以使用以下内容来匹配数字行：

grep '^[01]*$' infile > outfile.nr

That's much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE: . 这样更容易，但它会发现所有的行都只包含0和1，无论它们是否来自以SITE:开头的行SITE: 。

Answer 3

Here's a simple awk solution that matches all lines starting with SITE: and outputs the respective next line: 这是一个简单的awk解决方案，匹配以SITE:开头的所有行SITE:并输出相应的下一行：

awk '/^SITE:/ { if (getline) print }'  infile > outfile

Simply omit the { ... } block part to extract all lines starting with SITE: themselves to a separate file: 简单地忽略{ ... }块部分提取所有行开始SITE: 自己独立的文件：

awk '/^SITE:/' infile > outfile

If you wanted to combine both operations: 如果您想要结合这两个操作：

outfile1 and outfile2 are the names of the 2 output files, passed to awk as variables f1 and f2 : outfile1和outfile2是2个输出文件的名称，作为变量f1和f2传递给awk ：

awk -v f1=outfile1 -v f2=outfile2 \
  '/^SITE:/ { print > f1; if (getline) print > f2 }'  infile

从bash中的文件中提取行

问题描述

3 个解决方案

解决方案1
1 2014-04-28 20:50:24

解决方案2
1 2014-04-28 20:55:45

解决方案3
1 2014-04-28 21:35:03

从bash中的文件中提取行

问题描述

3 个解决方案

解决方案1 1 2014-04-28 20:50:24

解决方案2 1 2014-04-28 20:55:45

解决方案3 1 2014-04-28 21:35:03

解决方案1
1 2014-04-28 20:50:24

解决方案2
1 2014-04-28 20:55:45

解决方案3
1 2014-04-28 21:35:03