簡體   English   中英

內部連接在 pandas 數據幀中不起作用

[英]inner join not working in pandas dataframes

我有以下 2 個 pandas 數據幀:

    city Population
0            New York City   20153634
1              Los Angeles   13310447
2   San Francisco Bay Area    6657982
3                  Chicago    9512999
4        Dallas–Fort Worth    7233323
5         Washington, D.C.    6131977
6             Philadelphia    6070500
7                   Boston    4794447
8   Minneapolis–Saint Paul    3551036
9                   Denver    2853077
10   Miami–Fort Lauderdale    6066387
11                 Phoenix    4661537
12                 Detroit    4297617
13                 Toronto    5928040
14                 Houston    6772470
15                 Atlanta    5789700
16          Tampa Bay Area    3032171
17              Pittsburgh    2342299
18               Cleveland    2055612
19                 Seattle    3798902
20              Cincinnati    2165139
21             Kansas City    2104509
22               St. Louis    2807002
23               Baltimore    2798886
24               Charlotte    2474314
25            Indianapolis    2004230
26               Nashville    1865298
27               Milwaukee    1572482
28             New Orleans    1268883
29                 Buffalo    1132804
30                Montreal    4098927
31               Vancouver    2463431
32                 Orlando    2441257
33                Portland    2424955
34                Columbus    2041520
35                 Calgary    1392609
36                  Ottawa    1323783
37                Edmonton    1321426
38          Salt Lake City    1186187
39                Winnipeg     778489
40               San Diego    3317749
41             San Antonio    2429609
42              Sacramento    2296418
43               Las Vegas    2155664
44            Jacksonville    1478212
45           Oklahoma City    1373211
46                 Memphis    1342842
47                 Raleigh    1302946
48               Green Bay     318236
49                Hamilton     747545
50                  Regina     236481


            

      city  W/L Ratio
0                   Boston   2.500000
1                  Buffalo   0.555556
2                  Calgary   1.057143
3                  Chicago   0.846154
4                 Columbus   1.500000
5        Dallas–Fort Worth   1.312500
6                   Denver   1.433333
7                  Detroit   0.769231
8                 Edmonton   0.900000
9                Las Vegas   2.125000
10             Los Angeles   1.655862
11   Miami–Fort Lauderdale   1.466667
12  Minneapolis-Saint Paul   1.730769
13                Montreal   0.725000
14               Nashville   2.944444
15                New York   1.517241
16           New York City   0.908870
17                  Ottawa   0.651163
18            Philadelphia   1.615385
19                 Phoenix   0.707317
20              Pittsburgh   1.620690
21                 Raleigh   1.028571
22  San Francisco Bay Area   1.666667
23               St. Louis   1.375000
24               Tampa Bay   2.347826
25                 Toronto   1.884615
26               Vancouver   0.775000
27        Washington, D.C.   1.884615
28                Winnipeg   2.600000

我做了這樣的加入:

result = pd.merge(df, nhl_df , on="city")

結果應該有 28 行,而不是我有 24 行。

缺少的一個是例如邁阿密-勞德代爾堡

我已經仔細檢查了兩個數據框,沒有印刷錯誤。 那么,到底為什么不是dataframe呢?

 city Population  W/L Ratio
0            New York City   20153634   0.908870
1              Los Angeles   13310447   1.655862
2   San Francisco Bay Area    6657982   1.666667
3                  Chicago    9512999   0.846154
4        Dallas–Fort Worth    7233323   1.312500
5         Washington, D.C.    6131977   1.884615
6             Philadelphia    6070500   1.615385
7                   Boston    4794447   2.500000
8                   Denver    2853077   1.433333
9                  Phoenix    4661537   0.707317
10                 Detroit    4297617   0.769231
11                 Toronto    5928040   1.884615
12              Pittsburgh    2342299   1.620690
13               St. Louis    2807002   1.375000
14               Nashville    1865298   2.944444
15                 Buffalo    1132804   0.555556
16                Montreal    4098927   0.725000
17               Vancouver    2463431   0.775000
18                Columbus    2041520   1.500000
19                 Calgary    1392609   1.057143
20                  Ottawa    1323783   0.651163
21                Edmonton    1321426   0.900000
22                Winnipeg     778489   2.600000
23               Las Vegas    2155664   2.125000
24                 Raleigh    1302946   1.028571

我認為這里可以檢查 integer 中的相同字符是否代表 function ord中的字符,這里是不同的代碼150代碼8211 ,所以這是值不匹配的原因:

a = df1.loc[10, 'city']
print (a)
Miami–Fort Lauderdale

print ([ord(x) for x in a])
[77, 105, 97, 109, 105, 150, 70, 111, 114, 116, 32, 76, 97, 117, 100, 101, 114, 100, 97, 108, 101]


b = df2.loc[11, 'city']
print (b)
Miami–Fort Lauderdale

print ([ord(x) for x in b])
[77, 105, 97, 109, 105, 8211, 70, 111, 114, 116, 32, 76, 97, 117, 100, 101, 114, 100, 97, 108, 101]

您可以嘗試復制值以替換 select 正確-值:

#first – is copied from b, second – from a
df2['city'] = df2['city'].replace('–','–', regex=True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM