![](/img/trans.png)
[英]How to include columns in groupby that is not present in Pandas DataFrame
[英]How to Groupby columns(ignore order) in Pandas DataFrame?
我有一個 pandas 數據幀(8 列中的 4 列):
df = pd.DataFrame( {"departure_country":["Mexico","Mexico","United States","United States","United States","United States","Japan","United States","United States","United States"],"departure_city":["Guadalajara","Guadalajara","New York","Chicago","Los Angeles","Michigan","Tokyo","New York","New York","Chicago"],"destination_country":["United States","United States","United States","United States","Mexico","United States","United States","Mexico","United States","Japan"],"destination_city":["Los Angeles","Los Angeles","Chicago","New York","Guadalajara","New York","Chicago","Guadalajara","Michigan","Tokyo"]})
df
departure_country departure_city destination_country destination_city
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
2 United States New York United States Chicago
3 United States Chicago United States New York
4 United States Los Angeles Mexico Guadalajara
5 United States Michigan United States New York
6 Japan Tokyo United States Chicago
7 United States New York Mexico Guadalajara
8 United States New York United States Michigan
9 United States Chicago Japan Tokyo
我想分析每個組中的數據,所以我想首先對出發地和目的地的“同一對”進行分組,例如:
departure_country departure_city destination_country destination_city
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
2 United States Los Angeles Mexico Guadalajara
3 United States New York United States Chicago
4 United States Chicago United States New York
5 United States Michigan United States New York
6 United States New York United States Michigan
7 Japan Tokyo United States Chicago
8 United States Chicago Japan Tokyo
9 United States New York Mexico Guadalajara
是否可以在 DataFrame 中制作它? 我嘗試過 groupby 和 key-value,但我失敗了。 非常感謝您對此的幫助,謝謝!
我相信有人會想到一個更好的優化解決方案,但一種方法是創建您的國家/城市對的排序元組並按它排序:
print (df.assign(country=[tuple(sorted(i)) for i in df.filter(like="country").to_numpy()],
city=[tuple(sorted(i)) for i in df.filter(like="city").to_numpy()])
.sort_values(["country","city"], ascending=False).filter(like="_"))
departure_country departure_city destination_country destination_city
5 United States Michigan United States New York
8 United States New York United States Michigan
2 United States New York United States Chicago
3 United States Chicago United States New York
7 United States New York Mexico Guadalajara
0 Mexico Guadalajara United States Los Angeles
1 Mexico Guadalajara United States Los Angeles
4 United States Los Angeles Mexico Guadalajara
6 Japan Tokyo United States Chicago
9 United States Chicago Japan Tokyo
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.