Within my column I have several country names that contain numbers and/or parenthesis in their name that I need to remove.
For example:
The column in question is also set as my index if that impacts things?
try this:
In [121]: df
Out[121]:
expected
Bolivia (Plurinational State of) Bolivia
Switzerland17 Switzerland
In [122]: df.set_index(df.index.str.replace('\s*\(.*?\)\s*', '').str.replace('\d+',''), inplace=True)
In [123]: df
Out[123]:
expected
Bolivia Bolivia
Switzerland Switzerland
In [124]: df.index == df.expected
Out[124]: array([ True, True], dtype=bool)
In [125]: (df.index == df.expected).all()
Out[125]: True
def remove(data):
for i in range(len(data)):
if data[i].isdigit():
return data[:i]
elif (data[i]=='('):
return data[:i-1]
return data
df['Country'] = df['Country'].apply(remove)
def remove_digit(data):
newData = ''.join([i for i in data if not i.isdigit()])
i = newData.find('(')
if i>-1: newData = newData[:i]
return newData.strip()
energy['Country'] = energy['Country'].apply(remove_digit)
One way to accomplish it without calling the index.
import re
df.apply(lambda x : re.sub('\s*\(.*?\)\s*|\d+', '', x))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.