简体   繁体   中英

Regex match two characters following each other

I have a string with several spaces followed by commas in a pandas column. These are how the strings are organized.

original_string = "okay, , , , humans"

I want to remove the spaces and the subsequent commas so that the string will be:

goodstring = "okay,humans"

But when I use this regex pattern: [\s,]+ what I get is different. I get

badstring = "okayhumans" .

It removes the comma after okay but I want it to be like in goodstring. How can I do that?

Replace:

[\s,]*,[\s,]*

With:

,

See an online demo


  • [\s,]* - 0+ leading whitespace-characters or comma;
  • , - A literal comma (ensure we don't replace a single space);
  • [\s,]* - 0+ trainling whitespace-characters or comma.

In Pandas, this would translate to something like:

df[<YourColumn>].str.replace('[\s,]*,[\s,]*', ',', regex=True)

You have two issues with your code:

  1. Since [\s,]+ matches any combination of spaces and commas (eg single comma , ) you should not remove the match but replace it with ','
  2. [\s,]+ matches any combination of spaces and commas, eg just a space ' ' ; it is not what we are looking for, we must be sure that at least one comma is present in the match.

Code:

text = 'okay, ,  ,,,, humans! A,B,C'

result = re.sub(r'\s*,[\s,]*', ',', text);

Pattern:

\s*    - zero or more (leading) whitespaces
,      - comma (we must be sure that we have at least one comma in a match)
[\s,]* - arbitrary combination of spaces and commas

Please try this

re.sub('[,\s+,]+',',',original_string)

you want to replace ",[space]," with ",".

You could use substitution:

import re

pattern = r'[\s,]+'
original_string = "okay, , , , humans"
re.sub(r'[\s,]+', ',', original_string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM