简体   繁体   English

正则表达式必须匹配一行中的一个单词(不替换)和一个模式(替换)

[英]Regex Must Match a Word (not to replace) AND a Pattern (to replace) in a Line

With regex (can be PCRE or SED, but can also python[please specify]), I want to remove all occurrences of the lines that contain a single letter comma (/,.,/g) and with the word "Labels:"使用正则表达式(可以是 PCRE 或 SED,但也可以是 python [请指定]),我想删除所有出现的包含单个字母逗号(/,.,/g)和单词“Labels:”的行

So for example in these lines:因此,例如在这些行中:

Labels: K,ltemittel,System,j,Vakuum,s
Another tags: a,b,xxx,c,yyy,z

to

Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

What I've tried:我试过的:

  • non-capturing group ("Labels:" still also getting replaced)非捕获组(“标签:”仍然被替换)
  • lookahead and lookbehind (cannot use greedy)前瞻和后瞻(不能使用贪心)
  • grouping /(Labels:)*(,.,) (also capturing the non "Labels:")分组/(Labels:)*(,.,) (也捕获非“标签:”)

Using sed使用sed

$ sed '/Labels:/s/,[A-Za-z]\>//g;s/\<[A-Za-z],//' input_file
Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

Explanation (Added By Tripleee)说明(由 Tripleee 添加)

It looks for a comma, followed by an alphabetic, followed by a word boundary, ie the label after the comma is a single letter.它查找逗号,后跟字母,然后是单词边界,即逗号后面的 label 是单个字母。 Then, it removes any remaining single-letter label immediately before a comma by similar logic然后,它通过类似的逻辑在逗号之前删除任何剩余的单字母 label

You could potentially use:您可能会使用:

(?i)(^(?!Labels:).*)|\b[a-z],|,[a-z]\b

See an online demo查看在线演示


  • (?i) - Set case-insensitive matching 'on'; (?i) - 设置不区分大小写的匹配 'on';
  • ( - Open 1st capture group; ( - 打开第一个捕获组;
    • ^ - Start string anchor; ^ - 开始字符串锚;
    • (?:labels:) - Assert position is not followed by 'Labels:'; (?:labels:) - 断言 position 后面没有“标签:”;
    • .* - Match (Greedy) 0+ characters other than newline; .* - 匹配(贪婪)除换行符以外的 0+ 个字符;
    • ) - Close 1st capture group; ) - 关闭第一个捕获组;
  • | - Or; - 或者;
  • \b[az], - Match a word-boundary followed by a single letter and a comma; \b[az], - 匹配一个单词边界,后跟一个字母和一个逗号;
  • | - Or; - 或者;
  • ,[az]\b - Match a comma followed by a single letter and a word-boundary. ,[az]\b - 匹配逗号后跟单个字母和单词边界。

Now replace it with your 1st capture group.现在用你的第一个捕获组替换它。

Another variation using gnu-awk .使用gnu-awk另一个变体。

For a line that starts with Labels: replace a comma followed by a single char az or AZ and a word boundary with an empty string.对于以Labels:用空字符串替换逗号后跟单个字符 az 或 AZ 和单词边界。

awk '/^Labels:/{gsub(/,[a-zA-Z]\y|\y[a-zA-Z],/, "")};1' file

Output Output

Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

As you have tagged Python and pcre, another option is to use the \G anchor and match Label: at the start of the string, and capture in group 1 what you want to keep.由于您已标记 Python 和 pcre,另一种选择是使用\G锚并匹配Label:在字符串的开头,并在第 1 组中捕获您要保留的内容。

(?:^Labels:\h*|\G(?!^))\K(?:([^\s,]{2,}(?:,(?![a-z]$))?)|,?[a-z],?)

See a regex demo and a Python demo using the Python PyPi regex module .请参阅使用 Python PyPi 正则表达式模块正则表达式演示Python 演示

Using :使用

perl -lpe 's/(?:,[^,](?=,|$))+//g  if  s/^Labels:\s*\K(?:[^,](?:,|$))*//' file

After matching "Labels:" (which is \K ept), remove any leading single character items.匹配“标签:”(即\K ept)后,删除任何前导单字符项。 If that happened, remove all other single character items.如果发生这种情况,请删除所有其他单字符项目。 This assumes that the "Labels:" part cannot contain single characters separated by commas.这假定“标签:”部分不能包含用逗号分隔的单个字符。

$ cat file
Labels: K,ltemittel,a System z,j,Vakuum,s
Another tags: a,b,xxx,c,yyy,z
$ perl -lpe 's/(?:,[^,](?=,|$))+//g  if  s/^Labels:\s*\K(?:[^,](?:,|$))*//' file
Labels: ltemittel,a System z,Vakuum
Another tags: a,b,xxx,c,yyy,z

Note: System was changed to a System z in the above test.注意:在上述测试中, System已更改为a System z Solutions that rely on matching spaces or word boundaries may not deal with this input correctly.依赖匹配空格或单词边界的解决方案可能无法正确处理此输入。

This might work for you (GNU sed):这可能对您有用(GNU sed):

sed -E '/Labels/{s/( )\S,|(,)\S,|,\S$/\1\2/g;s//\1\2/g}' file

If a line contains Labels , pattern match for 3 alternate matches and if either the first and second match replace by the matching back reference.如果一行包含Labels ,则模式匹配 3 个备用匹配,并且如果第一个和第二个匹配替换为匹配的反向引用。 Repeat for any overlapping.重复任何重叠。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM