[英]Going through string columns and sort cell values in Pandas
Suppose we have the following dataframe:假设我们有以下 dataframe:
d = {'col1':['cat; banana','kiwi; orange; apple','melon'],
'col2':['a; d; c','p; u; c','m; a'],
'col3':[4,1,4]}
df= pd.DataFrame(d)
for all the string columns I want to sort the values alphabetically, I know how to do this column by column, namely:对于我想按字母顺序对值进行排序的所有字符串列,我知道如何逐列执行此操作,即:
df['col1'] = df['col1'].map(lambda x: '; '.join(sorted(x.split('; '))))
and similarly for col2
I wonder how one can does this for the whole dataframe?同样对于col2
,我想知道如何为整个 dataframe 做到这一点? I tried to select the string objects and do the map method, but it didn't work.我试图 select 字符串对象并执行 map 方法,但它没有用。 Namely:即:
df.select_dtypes(include='object').map(lambda x: '; '.join(sorted(x.split('; '))))
Update: So an inefficient way of doing this would be:更新:所以这样做的一种低效方法是:
v = df.select_dtypes(include='object').applymap(lambda x: '; '.join(sorted(x.split('; '))))
w = df.select_dtypes(exclude='object')
pd.concat([v, w], axis=1)
But I am sure there are better ways.但我相信还有更好的方法。
I would do this in an inefficient for loop with a test to make sure that you are not applying it to the ints我会在低效的 for 循环中执行此操作,并进行测试以确保您没有将其应用于整数
for col in df.columns:
if df[col].dtypes is 'str':
df[col] = df[col].map(lambda x: '; '.join(sorted(x.split('; '))))
there maybe a better vectorized way也许有更好的矢量化方式
You can use this trick (unpacking a dataframe and using pd.DataFrame.assign
):您可以使用此技巧(解压缩 dataframe 并使用pd.DataFrame.assign
):
df.assign(**df.select_dtypes(include='object').applymap(lambda x: '; '.join(sorted(x.split('; ')))))
Output: Output:
col1 col2 col3
0 banana; cat a; c; d 4
1 apple; kiwi; orange c; p; u 1
2 melon a; m 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.