[英]inner join not working in pandas dataframes
我有以下 2 個 pandas 數據幀:
city Population
0 New York City 20153634
1 Los Angeles 13310447
2 San Francisco Bay Area 6657982
3 Chicago 9512999
4 Dallas–Fort Worth 7233323
5 Washington, D.C. 6131977
6 Philadelphia 6070500
7 Boston 4794447
8 Minneapolis–Saint Paul 3551036
9 Denver 2853077
10 Miami–Fort Lauderdale 6066387
11 Phoenix 4661537
12 Detroit 4297617
13 Toronto 5928040
14 Houston 6772470
15 Atlanta 5789700
16 Tampa Bay Area 3032171
17 Pittsburgh 2342299
18 Cleveland 2055612
19 Seattle 3798902
20 Cincinnati 2165139
21 Kansas City 2104509
22 St. Louis 2807002
23 Baltimore 2798886
24 Charlotte 2474314
25 Indianapolis 2004230
26 Nashville 1865298
27 Milwaukee 1572482
28 New Orleans 1268883
29 Buffalo 1132804
30 Montreal 4098927
31 Vancouver 2463431
32 Orlando 2441257
33 Portland 2424955
34 Columbus 2041520
35 Calgary 1392609
36 Ottawa 1323783
37 Edmonton 1321426
38 Salt Lake City 1186187
39 Winnipeg 778489
40 San Diego 3317749
41 San Antonio 2429609
42 Sacramento 2296418
43 Las Vegas 2155664
44 Jacksonville 1478212
45 Oklahoma City 1373211
46 Memphis 1342842
47 Raleigh 1302946
48 Green Bay 318236
49 Hamilton 747545
50 Regina 236481
city W/L Ratio
0 Boston 2.500000
1 Buffalo 0.555556
2 Calgary 1.057143
3 Chicago 0.846154
4 Columbus 1.500000
5 Dallas–Fort Worth 1.312500
6 Denver 1.433333
7 Detroit 0.769231
8 Edmonton 0.900000
9 Las Vegas 2.125000
10 Los Angeles 1.655862
11 Miami–Fort Lauderdale 1.466667
12 Minneapolis-Saint Paul 1.730769
13 Montreal 0.725000
14 Nashville 2.944444
15 New York 1.517241
16 New York City 0.908870
17 Ottawa 0.651163
18 Philadelphia 1.615385
19 Phoenix 0.707317
20 Pittsburgh 1.620690
21 Raleigh 1.028571
22 San Francisco Bay Area 1.666667
23 St. Louis 1.375000
24 Tampa Bay 2.347826
25 Toronto 1.884615
26 Vancouver 0.775000
27 Washington, D.C. 1.884615
28 Winnipeg 2.600000
我做了這樣的加入:
result = pd.merge(df, nhl_df , on="city")
結果應該有 28 行,而不是我有 24 行。
缺少的一個是例如邁阿密-勞德代爾堡
我已經仔細檢查了兩個數據框,沒有印刷錯誤。 那么,到底為什么不是dataframe呢?
city Population W/L Ratio
0 New York City 20153634 0.908870
1 Los Angeles 13310447 1.655862
2 San Francisco Bay Area 6657982 1.666667
3 Chicago 9512999 0.846154
4 Dallas–Fort Worth 7233323 1.312500
5 Washington, D.C. 6131977 1.884615
6 Philadelphia 6070500 1.615385
7 Boston 4794447 2.500000
8 Denver 2853077 1.433333
9 Phoenix 4661537 0.707317
10 Detroit 4297617 0.769231
11 Toronto 5928040 1.884615
12 Pittsburgh 2342299 1.620690
13 St. Louis 2807002 1.375000
14 Nashville 1865298 2.944444
15 Buffalo 1132804 0.555556
16 Montreal 4098927 0.725000
17 Vancouver 2463431 0.775000
18 Columbus 2041520 1.500000
19 Calgary 1392609 1.057143
20 Ottawa 1323783 0.651163
21 Edmonton 1321426 0.900000
22 Winnipeg 778489 2.600000
23 Las Vegas 2155664 2.125000
24 Raleigh 1302946 1.028571
我認為這里可以檢查 integer 中的相同字符是否代表 function ord
中的字符,這里是不同的–
代碼150
和–
代碼8211
,所以這是值不匹配的原因:
a = df1.loc[10, 'city']
print (a)
Miami–Fort Lauderdale
print ([ord(x) for x in a])
[77, 105, 97, 109, 105, 150, 70, 111, 114, 116, 32, 76, 97, 117, 100, 101, 114, 100, 97, 108, 101]
b = df2.loc[11, 'city']
print (b)
Miami–Fort Lauderdale
print ([ord(x) for x in b])
[77, 105, 97, 109, 105, 8211, 70, 111, 114, 116, 32, 76, 97, 117, 100, 101, 114, 100, 97, 108, 101]
您可以嘗試復制值以替換 select 正確-
值:
#first – is copied from b, second – from a
df2['city'] = df2['city'].replace('–','–', regex=True)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.