[英]Efficient way to manipulate lists in a pandas dataframe
I have a DataFrame that starts like the following:我有一个 DataFrame ,其开头如下:
Column1第 1 列 | Column2第 2 列 | |
---|---|---|
0 0 | Berlin |
['Hamburg', 'Munich', 'Berlin', 'Stuttgart'] |
1 1 | Tokyo |
['Tokyo', 'Osaka', 'Kyoto', 'Sapporo'] |
2 2 | Los Angeles |
['New York', 'Chicago', 'Boston', 'Los Angeles'] |
3 3 | London |
['Birmingham', 'London', 'Glasgow', 'Liverpool'] |
I would like to delete cities that exists in Column1 from Column2 lists such a way that DataFrame becomes like this:我想从 Column2 列表中删除 Column1 中存在的城市,这样 DataFrame 就会变成这样:
Column1第 1 列 | Column2第 2 列 | |
---|---|---|
0 0 | Berlin |
['Hamburg', 'Munich', 'Stuttgart'] |
1 1 | Tokyo |
['Osaka', 'Kyoto', 'Sapporo'] |
2 2 | Los Angeles |
['New York', 'Chicago', 'Boston'] |
3 3 | London |
['Birmingham', 'Glasgow', 'Liverpool'] |
Since looping row by row is against the dataframe logic, what is the best way to approach this problem?由于逐行循环违反 dataframe 逻辑,解决此问题的最佳方法是什么?
Try explode
, query
and gropuby
:尝试explode
、 query
和gropuby
:
(df.explode('Column2')
.query('Column1 != Column2')
.groupby(level=0)
.agg({'Column1': 'first',
'Column2': list
})
)
Or just straight apply
:或者直接apply
:
df['Column2'] = df.apply(lambda row: [x for x in row['Column2'] if x!=x['Column1']],
axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.