I am trying to remove a string from a column using regular expressions and replace.
Name
"George @ ACkDk02gfe" sold
I want to remove " @ ACkDk02gfe"
I have tried several different variations of the code below, but I cant seem to remove string I want.
df['Name'] = df['Name'].str.replace('(\@\D+\"$)','')
The output should be
George sold
This portion of the string "ACkDk02gfe
is entirely random.
Let's try this using regex with | ("OR") and regex group:
df['Name'].str.replace('"|(\s@\s\w+)','', regex=True)
Output:
0 George sold
Name: Name, dtype: object
df['Name'].str.replace('"|(\s@\s\w*[-]?\w+)','')
Where df,
Name
0 "George @ ACkDk02gfe" sold
1 "Mike @ AisBcIy-rW" sold
Output:
0 George sold
1 Mike sold
Name: Name, dtype: object
Your pattern and syntax is wrong.
import pandas as pd
# set up the df
df = pd.DataFrame.from_dict(({'Name': '"George @ ACkDk02gfe" sold'},))
# use a raw string for the pattern
df['Name'] = df['Name'].str.replace(r'^"(\w+)\s@.*?"', '\\1')
I'll let someone else post a regex answer, but this could also be done with split. I don't know how consistent the data you are looking at is, but this would work for the provided string:
df['Name'] = df['Name'].str.split(' ').str[0].str[1:] + ' ' + df['Name'].str.split(' ').str[-1]
output:
George sold
This should do for you Split the string by a chain of whitespace,@,text immediately after @and whitespace after the text
. This results in a list
. remove the list corner brackets while separating elements by space using .str.join(' ')
df.Name=df.Name.str.split('\s\@\s\w+\s').str.join(' ')
0 George sold
To use a regex for replacement, you need to import re and use re.sub() instead of.replace().
import re
Name
"George @ ACkDk02gfe" sold
df['Name'] = re.sub(r"@.*$", "", df['Name'])
should work.
import re
ss = '"George @ ACkDk02gfe" sold'
ss = re.sub('"', "", ss)
ss = re.sub("\@\s*\w+", "", ss)
ss = re.sub("\s*", " ", ss)
George sold
Given that this is the general format of your code, here's what may help you understand the process I made. (1) substitute literal "
(2) substitute given regex \@\s*\w+
(means with literal @
that may be followed by whitespace/s then an alphanumeric word with multiple characters) will be replaced (3) substitute multiple whitespaces with a single whitespace.
You can wrap around a function to this process which you can simply call to a column. Hope it helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.