I have a string with several spaces followed by commas in a pandas column. These are how the strings are organized.
original_string = "okay, , , , humans"
I want to remove the spaces and the subsequent commas so that the string will be:
goodstring = "okay,humans"
But when I use this regex pattern: [\s,]+
what I get is different. I get
badstring = "okayhumans"
.
It removes the comma after okay but I want it to be like in goodstring. How can I do that?
Replace:
[\s,]*,[\s,]*
With:
,
See an online demo
[\s,]*
- 0+ leading whitespace-characters or comma; ,
- A literal comma (ensure we don't replace a single space); [\s,]*
- 0+ trainling whitespace-characters or comma. In Pandas, this would translate to something like:
df[<YourColumn>].str.replace('[\s,]*,[\s,]*', ',', regex=True)
You have two issues with your code:
[\s,]+
matches any combination of spaces and commas (eg single comma ,
) you should not remove the match but replace it with ','
[\s,]+
matches any combination of spaces and commas, eg just a space ' '
; it is not what we are looking for, we must be sure that at least one comma is present in the match.Code:
text = 'okay, , ,,,, humans! A,B,C'
result = re.sub(r'\s*,[\s,]*', ',', text);
Pattern:
\s* - zero or more (leading) whitespaces
, - comma (we must be sure that we have at least one comma in a match)
[\s,]* - arbitrary combination of spaces and commas
Please try this
re.sub('[,\s+,]+',',',original_string)
you want to replace ",[space]," with ",".
You could use substitution:
import re
pattern = r'[\s,]+'
original_string = "okay, , , , humans"
re.sub(r'[\s,]+', ',', original_string)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.