简体   繁体   中英

Using regular expressions to remove a string from a column

I am trying to remove a string from a column using regular expressions and replace.

                      Name

"George @ ACkDk02gfe" sold

I want to remove " @ ACkDk02gfe"

I have tried several different variations of the code below, but I cant seem to remove string I want.

df['Name'] = df['Name'].str.replace('(\@\D+\"$)','')

The output should be

George sold

This portion of the string "ACkDk02gfe is entirely random.

Let's try this using regex with | ("OR") and regex group:

df['Name'].str.replace('"|(\s@\s\w+)','', regex=True)

Output:

0    George sold
Name: Name, dtype: object

Updated

df['Name'].str.replace('"|(\s@\s\w*[-]?\w+)','')  

在此处输入图像描述

Where df,

                         Name
0  "George @ ACkDk02gfe" sold
1    "Mike @ AisBcIy-rW" sold

Output:

0    George sold
1      Mike sold
Name: Name, dtype: object

Your pattern and syntax is wrong.

import pandas as pd

# set up the df
df = pd.DataFrame.from_dict(({'Name': '"George @ ACkDk02gfe" sold'},))

# use a raw string for the pattern
df['Name'] = df['Name'].str.replace(r'^"(\w+)\s@.*?"', '\\1')

I'll let someone else post a regex answer, but this could also be done with split. I don't know how consistent the data you are looking at is, but this would work for the provided string:

df['Name'] = df['Name'].str.split(' ').str[0].str[1:] + ' ' + df['Name'].str.split(' ').str[-1]

output:

George sold

This should do for you Split the string by a chain of whitespace,@,text immediately after @and whitespace after the text . This results in a list . remove the list corner brackets while separating elements by space using .str.join(' ')

df.Name=df.Name.str.split('\s\@\s\w+\s').str.join(' ')



 0    George sold

To use a regex for replacement, you need to import re and use re.sub() instead of.replace().

import re
                      Name

"George @ ACkDk02gfe" sold

df['Name'] = re.sub(r"@.*$", "", df['Name'])

should work.

import re
ss = '"George @ ACkDk02gfe" sold'
ss = re.sub('"', "", ss)
ss = re.sub("\@\s*\w+", "", ss)
ss = re.sub("\s*", " ", ss)

George sold

Given that this is the general format of your code, here's what may help you understand the process I made. (1) substitute literal " (2) substitute given regex \@\s*\w+ (means with literal @ that may be followed by whitespace/s then an alphanumeric word with multiple characters) will be replaced (3) substitute multiple whitespaces with a single whitespace.

You can wrap around a function to this process which you can simply call to a column. Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM