[英]How to join two dataframes by more than one key?
我需要將數據幀df_original的列“評級”(通過鍵,“ userId”和“ movieId”)與數據幀df_workspace進行連接。
>數據框df_workspace
userId movieId cluster
0 1 2 2
1 1 29 2
2 1 260 2
3 1 589 2
4 1 653 2
5 1 919 2
6 1 1009 2
7 1 1196 2
8 1 1198 2
9 1 1200 2
10 1 1201 2
11 1 1291 2
12 1 1304 2
13 1 1374 2
14 1 1525 2
15 1 1750 2
16 1 1920 2
17 1 1967 2
18 1 2021 2
19 1 2138 2
20 1 2140 2
21 1 2143 2
22 1 2173 2
23 1 2193 2
24 1 2628 2
25 1 2761 2
26 1 2872 2
27 1 3000 2
28 1 3030 2
29 1 3037 2
>數據框df_original
userId movieId title \
0 1 2 Jumanji (1995)
1 1 29 City of Lost Children, The (Cité des enfants ...
2 1 32 Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
3 1 47 Seven (a.k.a. Se7en) (1995)
4 1 50 Usual Suspects, The (1995)
5 1 112 Rumble in the Bronx (Hont faan kui) (1995)
6 1 151 Rob Roy (1995)
7 1 223 Clerks (1994)
8 1 253 Interview with the Vampire: The Vampire Chroni...
9 1 260 Star Wars: Episode IV - A New Hope (1977)
genres rating timestamp
0 Adventure|Children|Fantasy 3.5 2005-04-02 23:53:47.000
1 Adventure|Drama|Fantasy|Mystery|Sci-Fi 3.5 2005-04-02 23:31:16.000
2 Mystery|Sci-Fi|Thriller 3.5 2005-04-02 23:33:39.000
3 Mystery|Thriller 3.5 2005-04-02 23:32:07.000
4 Crime|Mystery|Thriller 3.5 2005-04-02 23:29:40.000
5 Action|Adventure|Comedy|Crime 3.5 2004-09-10 03:09:00.000
6 Action|Drama|Romance|War 4.0 2004-09-10 03:08:54.000
7 Comedy 4.0 2005-04-02 23:46:13.000
8 Drama|Horror 4.0 2005-04-02 23:35:40.000
9 Action|Adventure|Sci-Fi 4.0 2005-04-02 23:33:46.000
>輸出示例
userId movieId cluster rating
0 1 2 2 3.5
1 1 29 2 4.0
2 1 260 2 3.5
3 1 589 2 2.0
4 1 653 2 5.0
5 1 919 2 4.5
我嘗試使用連接,但我不明白如何使用多個鍵。
嘗試這個 :
df_output = df_original.merge(df_workspace, how='inner', on=['userId', 'movieId'])
還有一個加入方法,但我更喜歡合並
嘗試:
df_workspace.merge(df_original[['userId','movieId','rating']])
默認情況下, merge
在所有標記為相同的列上聯接。 而且,通過過濾df_orginal數據框列,您只會得到所需的輸出列。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.