[英]How to Groupby columns(ignore order) in Pandas DataFrame?
I have a pandas dataframe(4 of 8 columns):我有一个 pandas 数据帧(8 列中的 4 列):
df = pd.DataFrame( {"departure_country":["Mexico","Mexico","United States","United States","United States","United States","Japan","United States","United States","United States"],"departure_city":["Guadalajara","Guadalajara","New York","Chicago","Los Angeles","Michigan","Tokyo","New York","New York","Chicago"],"destination_country":["United States","United States","United States","United States","Mexico","United States","United States","Mexico","United States","Japan"],"destination_city":["Los Angeles","Los Angeles","Chicago","New York","Guadalajara","New York","Chicago","Guadalajara","Michigan","Tokyo"]})
df
departure_country departure_city destination_country destination_city
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
2 United States New York United States Chicago
3 United States Chicago United States New York
4 United States Los Angeles Mexico Guadalajara
5 United States Michigan United States New York
6 Japan Tokyo United States Chicago
7 United States New York Mexico Guadalajara
8 United States New York United States Michigan
9 United States Chicago Japan Tokyo
I want to analyze the data in each group so I would like to groupby "the same pair" of departure and destination first, something like:我想分析每个组中的数据,所以我想首先对出发地和目的地的“同一对”进行分组,例如:
departure_country departure_city destination_country destination_city
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
2 United States Los Angeles Mexico Guadalajara
3 United States New York United States Chicago
4 United States Chicago United States New York
5 United States Michigan United States New York
6 United States New York United States Michigan
7 Japan Tokyo United States Chicago
8 United States Chicago Japan Tokyo
9 United States New York Mexico Guadalajara
Is it possible to make it in a DataFrame?是否可以在 DataFrame 中制作它? I have tried groupby and key-value, but I failed.我尝试过 groupby 和 key-value,但我失败了。 Really appreciate your help with this, thanks!非常感谢您对此的帮助,谢谢!
I'm sure someone could think of a better optimized solution, but one way is to create sorted tuples of your country/city pairs and sort by it:我相信有人会想到一个更好的优化解决方案,但一种方法是创建您的国家/城市对的排序元组并按它排序:
print (df.assign(country=[tuple(sorted(i)) for i in df.filter(like="country").to_numpy()],
city=[tuple(sorted(i)) for i in df.filter(like="city").to_numpy()])
.sort_values(["country","city"], ascending=False).filter(like="_"))
departure_country departure_city destination_country destination_city
5 United States Michigan United States New York
8 United States New York United States Michigan
2 United States New York United States Chicago
3 United States Chicago United States New York
7 United States New York Mexico Guadalajara
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
4 United States Los Angeles Mexico Guadalajara
6 Japan Tokyo United States Chicago
9 United States Chicago Japan Tokyo
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.