简体   繁体   中英

Python remove part of the string from column in a dataframe

Hi I am working on python. I created a dataframe from a csv file. One column "name" which is a text column, has inside in different places this pattern ' (some_number + %)', example:

"145 wefwignweon (100%) , 1rberbebe (50%) , vwrbvwrbe (100%) , 140 ewggrrwrg"

I need to delete from this column where says: ' (100%)', '(100%), '(50%') In other columns are different percentage values

import pandas as pd

path_to_dir="/Users/user/Documents/file/"
name='owner.csv'
df_owner = pd.read_csv(path_to_dir+name, encoding='windows-1252') 
#df_owner["name"] =  df_owner["name"] drop where says => (' (@some_number%)')

How I can create like a kind of regular expression to drop where find this kind of values something like this? delete where says '( some_number + %)' in name column from df_owner dataframe

Regards

You can use the regular expression \(\d+%\) :

df = df[~df['name'].str.contains(r' \(\d+%\)', regex=True)]

Capture all numbers up to three digits gives r'\d{1,3}'

But you also seem to want the parentheses, and they and the percentage sign have to be escaped, so that will be r'\(\d{1,3}\)\%' . You can then replace occurrences of that regex with the null string with lambda x: re.sub(r'\(\d{1,3}\)\%', '', x) . You also might want to add the leading space to the regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM