简体   繁体   English

在 pandas dataframe 中操作列表的有效方法

[英]Efficient way to manipulate lists in a pandas dataframe

I have a DataFrame that starts like the following:我有一个 DataFrame ,其开头如下:

Column1第 1 列 Column2第 2 列
0 0 Berlin ['Hamburg', 'Munich', 'Berlin', 'Stuttgart']
1 1 Tokyo ['Tokyo', 'Osaka', 'Kyoto', 'Sapporo']
2 2 Los Angeles ['New York', 'Chicago', 'Boston', 'Los Angeles']
3 3 London ['Birmingham', 'London', 'Glasgow', 'Liverpool']

I would like to delete cities that exists in Column1 from Column2 lists such a way that DataFrame becomes like this:我想从 Column2 列表中删除 Column1 中存在的城市,这样 DataFrame 就会变成这样:

Column1第 1 列 Column2第 2 列
0 0 Berlin ['Hamburg', 'Munich', 'Stuttgart']
1 1 Tokyo ['Osaka', 'Kyoto', 'Sapporo']
2 2 Los Angeles ['New York', 'Chicago', 'Boston']
3 3 London ['Birmingham', 'Glasgow', 'Liverpool']

Since looping row by row is against the dataframe logic, what is the best way to approach this problem?由于逐行循环违反 dataframe 逻辑,解决此问题的最佳方法是什么?

Try explode , query and gropuby :尝试explodequerygropuby

(df.explode('Column2')
   .query('Column1 != Column2')
   .groupby(level=0)
   .agg({'Column1': 'first',
         'Column2': list
       })
)

Or just straight apply :或者直接apply

df['Column2'] = df.apply(lambda row: [x for x in row['Column2'] if x!=x['Column1']], 
                         axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM