I am using a pandas dataframe and I would like to remove all information after a space occures. My dataframe is similar as this one:
import pandas as pd
d = {'Australia' : pd.Series([0,'1980 (F)\n\n1957 (T)\n\n',1991], index=['Australia', 'Belgium', 'France']),
'Belgium' : pd.Series([1980,0,1992], index=['Australia','Belgium', 'France']),
'France' : pd.Series([1991,1992,0], index=['Australia','Belgium', 'France'])}
df = pd.DataFrame(d, dtype='str')
df
I am able to remove the values for one specific column, however the split() function does not apply to the whole dataframe.
f = lambda x: x["Australia"].split(" ")[0]
df = df.apply(f, axis=1)
Anyone an idea how I could remove the information after a space occures for each value in the dataframe?
Let's try using assign since the column names in this dataframe are "well tame" meaning not containing a space nor special characters:
df.assign(Australia=df.Australia.str.split().str[0])
Output:
Australia Belgium France
Australia 0 1980 1991
Belgium 1980 0 1992
France 1991 1992 0
Or you can use apply and a lamda function if all your column datatypes are strings:
df.apply(lambda x: x.str.split().str[0])
Or if you have a mixture of numbers and string dtypes then you can use select_dtypes
with assign
like this:
df.assign(**df.select_dtypes(exclude=np.number).apply(lambda x: x.str.split().str[0]))
I think need convert all columns to string
s and then apply split
function:
df = df.astype(str).apply(lambda x: x.str.split().str[0])
Another solution:
df = df.astype(str).applymap(lambda x: x.split()[0])
print (df)
Australia Belgium France
Australia 0 1980 1991
Belgium 1980 0 1992
France 1991 1992 0
You could loop over all columns and apply below:
for column in df:
df[column] = df[column].str.split().str[0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.