Regex Must Match a Word (not to replace) AND a Pattern (to replace) in a Line

Question

With regex (can be PCRE or SED, but can also python[please specify]), I want to remove all occurrences of the lines that contain a single letter comma (/,.,/g) and with the word "Labels:"

So for example in these lines:

Labels: K,ltemittel,System,j,Vakuum,s
Another tags: a,b,xxx,c,yyy,z

to

Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

What I've tried:

non-capturing group ("Labels:" still also getting replaced)
lookahead and lookbehind (cannot use greedy)
grouping /(Labels:)*(,.,) (also capturing the non "Labels:")

Answer 1

Using sed

$ sed '/Labels:/s/,[A-Za-z]\>//g;s/\<[A-Za-z],//' input_file
Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

Explanation (Added By Tripleee)

It looks for a comma, followed by an alphabetic, followed by a word boundary, ie the label after the comma is a single letter. Then, it removes any remaining single-letter label immediately before a comma by similar logic

Answer 2

You could potentially use:

(?i)(^(?!Labels:).*)|\b[a-z],|,[a-z]\b

See an online demo

(?i) - Set case-insensitive matching 'on';
( - Open 1st capture group;
- ^ - Start string anchor;
- (?:labels:) - Assert position is not followed by 'Labels:';
- .* - Match (Greedy) 0+ characters other than newline;
- ) - Close 1st capture group;
| - Or;
\b[az], - Match a word-boundary followed by a single letter and a comma;
| - Or;
,[az]\b - Match a comma followed by a single letter and a word-boundary.

Now replace it with your 1st capture group.

Answer 3

Another variation using gnu-awk .

For a line that starts with Labels: replace a comma followed by a single char az or AZ and a word boundary with an empty string.

awk '/^Labels:/{gsub(/,[a-zA-Z]\y|\y[a-zA-Z],/, "")};1' file

Output

Labels: ltemittel,System,Vakuum
Another tags: a,b,xxx,c,yyy,z

As you have tagged Python and pcre, another option is to use the \G anchor and match Label: at the start of the string, and capture in group 1 what you want to keep.

(?:^Labels:\h*|\G(?!^))\K(?:([^\s,]{2,}(?:,(?![a-z]$))?)|,?[a-z],?)

See a regex demo and a Python demo using the Python PyPi regex module .

Answer 4

Using perl :

perl -lpe 's/(?:,[^,](?=,|$))+//g  if  s/^Labels:\s*\K(?:[^,](?:,|$))*//' file

After matching "Labels:" (which is \K ept), remove any leading single character items. If that happened, remove all other single character items. This assumes that the "Labels:" part cannot contain single characters separated by commas.

$ cat file
Labels: K,ltemittel,a System z,j,Vakuum,s
Another tags: a,b,xxx,c,yyy,z
$ perl -lpe 's/(?:,[^,](?=,|$))+//g  if  s/^Labels:\s*\K(?:[^,](?:,|$))*//' file
Labels: ltemittel,a System z,Vakuum
Another tags: a,b,xxx,c,yyy,z

Note: System was changed to a System z in the above test. Solutions that rely on matching spaces or word boundaries may not deal with this input correctly.

Answer 5

This might work for you (GNU sed):

sed -E '/Labels/{s/( )\S,|(,)\S,|,\S$/\1\2/g;s//\1\2/g}' file

If a line contains Labels , pattern match for 3 alternate matches and if either the first and second match replace by the matching back reference. Repeat for any overlapping.

Regex Must Match a Word (not to replace) AND a Pattern (to replace) in a Line

Question

5 answers

solution1
2 2021-12-28 11:24:05

solution2
1 2021-12-28 09:06:01

solution3
1 2021-12-28 13:36:14

solution4
1 2021-12-29 05:47:58

solution5
0 2021-12-28 16:56:26

Regex Must Match a Word (not to replace) AND a Pattern (to replace) in a Line

Question

5 answers

solution1 2 2021-12-28 11:24:05

solution2 1 2021-12-28 09:06:01

solution3 1 2021-12-28 13:36:14

solution4 1 2021-12-29 05:47:58

solution5 0 2021-12-28 16:56:26

solution1
2 2021-12-28 11:24:05

solution2
1 2021-12-28 09:06:01

solution3
1 2021-12-28 13:36:14

solution4
1 2021-12-29 05:47:58

solution5
0 2021-12-28 16:56:26