寻找正确的csplit正则表达式

Question

I have a file that contains several lines like these: 我有一个包含以下几行的文件：

1291126929200 started 88 videolist15.txt 4 Good 4
1291126929250 59.875 29.0 29.580243595150186 43.016096916037604
1291126929296 59.921 29.0 29.52749417740926 42.78632483544682
1291126929359 59.984 29.0 29.479540161281143 42.56031951027556
1291126929437 60.046 50.0 31.345036510255586 42.682281485516945
1291126932859 started 88 videolist15.txt 5 Good 4

I want to split the files for every line that contains started (or videolist , does not matter). 我想为包含已started每一行（或videolist ，无关紧要）拆分文件。

The following command only produces 2 output files: 以下命令仅生成2个输出文件：

$ csplit -k input.txt /started/

However I expect a lot more, as can be seen in: 但是我期待更多，如下所示：

$ grep -i started input.txt |wc -l
$ 146

What would be the correct csplit command? 什么是正确的csplit命令？

Thanks 谢谢

Answer 1

Just add {*} at the end: 最后添加{*} ：

$ csplit -k input.txt /started/ {*}

The man page says: 手册页说：

{*}    repeat the previous pattern as many times as possible.

Demo: 演示：

$ cat file
1
foo
2
foo
3
foo
$ csplit -k file /foo/ {*}
2
6
6
4
$ ls -tr xx*             
xx03  xx02  xx01  xx00
$ csplit --version
csplit (GNU coreutils) 7.4

Answer 2

According to the Open Group specifications the csplit command accepts basic regular expressions . 根据Open Group规范， csplit命令接受基本正则表达式。

Basic REGEXPs are a limited subset of full regex implementations. 基本REGEXP是完整正则表达式实现的有限子集。 They support literal characters, asterisk (*), dot (.), character classes ([0-9]) and anchors (^,$). 它们支持文字字符，星号（*），点（。），字符类（[0-9]）和锚点（^，$）。 They don't support one-or-more (+) or alternation (a|b). 它们不支持一个或多个（+）或交替（a | b）。

寻找正确的csplit正则表达式

问题描述

2 个解决方案

解决方案1
11 已采纳 2010-12-01 11:35:12

解决方案2
2 2015-08-12 19:26:11

寻找正确的csplit正则表达式

问题描述

2 个解决方案

解决方案1 11 已采纳 2010-12-01 11:35:12

解决方案2 2 2015-08-12 19:26:11

解决方案1
11 已采纳 2010-12-01 11:35:12

解决方案2
2 2015-08-12 19:26:11