简体   繁体   中英

Remove characters from a string in a dataframe

python beginner here. I would like to change some characters in a column in a dataframe under certain conditions.

The dataframe looks like this:

import pandas as pd
import numpy as np
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
                      'age': [20, 19, 22, 21],
                      'favorite_color': ['blue (VS)', 'red', 'yellow (AG)', "green"],
                      'grade': [88, 92, 95, 70]}
df = pd.DataFrame(raw_data, index = ['0', '1', '2', '3'])
df

My goal is to replace in the column last name the space followed by the parenthesis and the two letters.

Blue instead of Blue (VS).

There is 26 letter variations that I have to remove but only one format: last_name followed by space followed by parenthesis followed by two letters followed by parenthesis. From what I understood it should be that in regexp:

( \(..\)

I tried using str.replace but it only works for exact match and it replaces the whole value. I also tried this:

df.loc[df['favorite_color'].str.contains(‘VS’), 'favorite_color'] = ‘random’

it also replaces the whole value.

I saw that I can only rewrite the value but I also saw that using this:

df[0].str.slice(0, -5)

I could remove the last 5 characters of a string containing my search.

In my mind I should make a list of the 26 occurrences that I want to be removed and parse through the column to remove those while keeping the text before. I searched for post similar to my problem but could not find a solution. Do you have any idea for a direction ?

You can use str.replace with pattern "(\\(.*?\\))"

Ex:

import pandas as pd

raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
                      'age': [20, 19, 22, 21],
                      'favorite_color': ['blue (VS)', 'red', 'yellow (AG)', "green"],
                      'grade': [88, 92, 95, 70]}
df = pd.DataFrame(raw_data, index = ['0', '1', '2', '3'])
df["newCol"] = df["favorite_color"].str.replace("(\(.*?\))", "").str.strip()
print( df )

Output:

   age favorite_color  grade              name  newCol
0   20      blue (VS)     88    Willard Morris    blue
1   19            red     92       Al Jennings     red
2   22    yellow (AG)     95      Omar Mullins  yellow
3   21          green     70  Spencer McDaniel   green

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM