简体   繁体   English

如何通过一个以上的键连接两个数据框?

[英]How to join two dataframes by more than one key?

I need to join column 'rating' from dataframe df_original - by keys, 'userId' and 'movieId' - with dataframe df_workspace. 我需要将数据帧df_original的列“评级”(通过键,“ userId”和“ movieId”)与数据帧df_workspace进行连接。

> Dataframe df_workspace >数据框df_workspace

    userId  movieId  cluster
0         1        2        2
1         1       29        2
2         1      260        2
3         1      589        2
4         1      653        2
5         1      919        2
6         1     1009        2
7         1     1196        2
8         1     1198        2
9         1     1200        2
10        1     1201        2
11        1     1291        2
12        1     1304        2
13        1     1374        2
14        1     1525        2
15        1     1750        2
16        1     1920        2
17        1     1967        2
18        1     2021        2
19        1     2138        2
20        1     2140        2
21        1     2143        2
22        1     2173        2
23        1     2193        2
24        1     2628        2
25        1     2761        2
26        1     2872        2
27        1     3000        2
28        1     3030        2
29        1     3037        2

> Dataframe df_original >数据框df_original

   userId  movieId                                              title  \
0       1        2                                     Jumanji (1995)   
1       1       29  City of Lost Children, The (Cité des enfants ...   
2       1       32          Twelve Monkeys (a.k.a. 12 Monkeys) (1995)   
3       1       47                        Seven (a.k.a. Se7en) (1995)   
4       1       50                         Usual Suspects, The (1995)   
5       1      112         Rumble in the Bronx (Hont faan kui) (1995)   
6       1      151                                     Rob Roy (1995)   
7       1      223                                      Clerks (1994)   
8       1      253  Interview with the Vampire: The Vampire Chroni...   
9       1      260          Star Wars: Episode IV - A New Hope (1977)   

                                   genres  rating                timestamp  
0              Adventure|Children|Fantasy     3.5  2005-04-02 23:53:47.000  
1  Adventure|Drama|Fantasy|Mystery|Sci-Fi     3.5  2005-04-02 23:31:16.000  
2                 Mystery|Sci-Fi|Thriller     3.5  2005-04-02 23:33:39.000  
3                        Mystery|Thriller     3.5  2005-04-02 23:32:07.000  
4                  Crime|Mystery|Thriller     3.5  2005-04-02 23:29:40.000  
5           Action|Adventure|Comedy|Crime     3.5  2004-09-10 03:09:00.000  
6                Action|Drama|Romance|War     4.0  2004-09-10 03:08:54.000  
7                                  Comedy     4.0  2005-04-02 23:46:13.000  
8                            Drama|Horror     4.0  2005-04-02 23:35:40.000  
9                 Action|Adventure|Sci-Fi     4.0  2005-04-02 23:33:46.000 

> OUTPUT EXAMPLE >输出示例

    userId  movieId  cluster   rating
0         1        2        2   3.5
1         1       29        2   4.0
2         1      260        2   3.5
3         1      589        2   2.0
4         1      653        2   5.0
5         1      919        2   4.5

I try to use join but i dont understand how to use more than one key. 我尝试使用连接,但我不明白如何使用多个键。

Try this : 尝试这个 :

df_output = df_original.merge(df_workspace, how='inner', on=['userId', 'movieId'])

There is also a join method but I rather like merge 还有一个加入方法,但我更喜欢合并

Try: 尝试:

df_workspace.merge(df_original[['userId','movieId','rating']])

merge by default joins on all columns that are labelled the same. 默认情况下, merge在所有标记为相同的列上联接。 And, by filtering your df_orginal dataframe columns, then you only get the output columns you desire. 而且,通过过滤df_orginal数据框列,您只会得到所需的输出列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM