简体   繁体   English

遍历字符串列并对 Pandas 中的单元格值进行排序

[英]Going through string columns and sort cell values in Pandas

Suppose we have the following dataframe:假设我们有以下 dataframe:

d = {'col1':['cat; banana','kiwi; orange; apple','melon'],
    'col2':['a; d; c','p; u; c','m; a'],
    'col3':[4,1,4]}
df= pd.DataFrame(d)

for all the string columns I want to sort the values alphabetically, I know how to do this column by column, namely:对于我想按字母顺序对值进行排序的所有字符串列,我知道如何逐列执行此操作,即:

df['col1'] = df['col1'].map(lambda x: '; '.join(sorted(x.split('; '))))

and similarly for col2 I wonder how one can does this for the whole dataframe?同样对于col2 ,我想知道如何为整个 dataframe 做到这一点? I tried to select the string objects and do the map method, but it didn't work.我试图 select 字符串对象并执行 map 方法,但它没有用。 Namely:即:

df.select_dtypes(include='object').map(lambda x: '; '.join(sorted(x.split('; '))))

Update: So an inefficient way of doing this would be:更新:所以这样做的一种低效方法是:

v = df.select_dtypes(include='object').applymap(lambda x: '; '.join(sorted(x.split('; '))))
w = df.select_dtypes(exclude='object')
pd.concat([v, w], axis=1)

But I am sure there are better ways.但我相信还有更好的方法。

I would do this in an inefficient for loop with a test to make sure that you are not applying it to the ints我会在低效的 for 循环中执行此操作,并进行测试以确保您没有将其应用于整数

for col in df.columns:
    if df[col].dtypes is 'str':
        df[col] = df[col].map(lambda x: '; '.join(sorted(x.split('; '))))

there maybe a better vectorized way也许有更好的矢量化方式

You can use this trick (unpacking a dataframe and using pd.DataFrame.assign ):您可以使用此技巧(解压缩 dataframe 并使用pd.DataFrame.assign ):

df.assign(**df.select_dtypes(include='object').applymap(lambda x: '; '.join(sorted(x.split('; ')))))

Output: Output:

                  col1     col2  col3
0          banana; cat  a; c; d     4
1  apple; kiwi; orange  c; p; u     1
2                melon     a; m     4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM