[英]Trimming a file with regular expressions / sed
I've got a file with several lines like this: 我有一个包含以下几行的文件:
*wordX*-Sentence1.;Sentence2.;Sentence3.;Sentence4.
One of these Sentences may or may not contain wordX. 这些句子之一可能包含也可能不包含wordX。 What I want is to trim the file to make it look like this:
我要修剪的文件使其看起来像这样:
*wordX*-Sentence1.;Sentence2.
Where Sentence3 was the first to contain wordX. Sentence3是第一个包含wordX的位置。
How can i do this with sed/awk? 我该如何用sed / awk做到这一点?
Edit: 编辑:
Here's a sample file: 这是一个示例文件:
*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.;Here is WordA.;But not here.
*WordB*-WordA here.;WordB here, time to delete everything.;Including this sentece.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.;WordC.;Discard this.
And here is the desired output: 这是所需的输出:
*WordA*-This sentence does not contain what i want.%Neither does this one.;Not here either.;Not here.
*WordB*-WordA here.
*WordC*-WordA, WordB. %Sample sentence one.;Sample Sentence 2.;Sample sentence 3.;Sample sentence 4.
This task is more suited to awk. 此任务更适合awk。 Use following awk command:
使用以下awk命令:
awk -F ";" '/^ *\*.*?\*/ {printf("%s;%s\n", $1, $2)}' inFile
This assumes that the words your are trying to match are always wrapped in asterisks *
. 假设您要匹配的单词始终用星号
*
包裹。
This might work for you (GNU sed): 这可能对您有用(GNU sed):
sed -r 's/-/;/;:a;s/^(\*([^*]+)\*.*);[^;]+\2.*/\1;/;ta;s/;/-/;s/;$//' file
Convert the -
following the wordX
to a ;
将
wordX
的-
转换为;
. 。 Delete sentences containing
wordX
( working from the back to the front of the line). 删除包含
wordX
句子(从行尾到行尾)。 Replace the original -
.Delete the last ;
替换原稿
-
删除最后一个;
. 。
sed -r -e 's/\.;/\n/g' \
-e 's/-/\n/' \
-e 's/^(\*([^*]*).*\n)[^\n]*\2.*/\1/' \
-e 's/\n/-/' \
-e 's/\n/.;/g' \
-e 's/;$//'
(edit: added the -
: \\n
swaps to handle a match in the first sentence.) (编辑:添加了
-
: \\n
交换以处理第一句中的匹配项。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.