简体   繁体   English

寻找正确的csplit正则表达式

[英]Looking for correct Regular Expression for csplit

I have a file that contains several lines like these: 我有一个包含以下几行的文件:

1291126929200 started 88 videolist15.txt 4 Good 4
1291126929250 59.875 29.0 29.580243595150186 43.016096916037604
1291126929296 59.921 29.0 29.52749417740926 42.78632483544682
1291126929359 59.984 29.0 29.479540161281143 42.56031951027556
1291126929437 60.046 50.0 31.345036510255586 42.682281485516945
1291126932859 started 88 videolist15.txt 5 Good 4

I want to split the files for every line that contains started (or videolist , does not matter). 我想为包含已started每一行(或videolist ,无关紧要)拆分文件。

The following command only produces 2 output files: 以下命令仅生成2个输出文件:

$ csplit -k input.txt /started/

However I expect a lot more, as can be seen in: 但是我期待更多,如下所示:

$ grep -i started input.txt |wc -l
$ 146

What would be the correct csplit command? 什么是正确的csplit命令?

Thanks 谢谢

Just add {*} at the end: 最后添加{*}

$ csplit -k input.txt /started/ {*}

The man page says: 手册页说:

{*}    repeat the previous pattern as many times as possible.

Demo: 演示:

$ cat file
1
foo
2
foo
3
foo
$ csplit -k file /foo/ {*}
2
6
6
4
$ ls -tr xx*             
xx03  xx02  xx01  xx00
$ csplit --version
csplit (GNU coreutils) 7.4

According to the Open Group specifications the csplit command accepts basic regular expressions . 根据Open Group规范, csplit命令接受基本正则表达式

Basic REGEXPs are a limited subset of full regex implementations. 基本REGEXP是完整正则表达式实现的有限子集。 They support literal characters, asterisk (*), dot (.), character classes ([0-9]) and anchors (^,$). 它们支持文字字符,星号(*),点(。),字符类([0-9])和锚点(^,$)。 They don't support one-or-more (+) or alternation (a|b). 它们支持一个或多个(+)或交替(a | b)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM