與 Pandas 默認值相比，使用 modin 提供不同的結果

Question

當我在modin中使用 pandas 和使用pandas default 時，我得到不同的結果

print(selection_weights.head())
  country                      league   Win   DNB  O 1.5  U 4.5
0  Africa       Africa Cup of Nations  3.68  1.86    5.2   1.45
1  Africa   Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2  Africa   Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3  Africa   Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4  Africa  African Championship Women  2.07  1.50    3.3   1.45

print(historical_games.head())
   Unnamed: 0  home_odds  draw_odds  away_odds country            league             datetime        home_team   away_team  home_score  away_score
0           0       1.36       4.31       7.66  Brazil  Copa do Nordeste  2020-02-07 00:00:00     Sport Recife  Imperatriz           2           2
1           1       2.62       3.30       2.48  Brazil  Copa do Nordeste  2020-02-02 22:00:00              ABC  America RN           2           1
2           2       5.19       3.58       1.62  Brazil  Copa do Nordeste  2020-02-02 00:00:00  Frei Paulistano     Nautico           0           2
3           3       2.06       3.16       3.50  Brazil  Copa do Nordeste  2020-02-02 22:00:00      Botafogo PB   Confianca           1           1
4           4       2.19       2.98       3.38  Brazil  Copa do Nordeste  2020-02-02 22:00:00        Fortaleza       Ceara           1           1

當我在默認pandas中運行以下代碼時，輸出是所需的：

import pandas as pd

selection_db = historical_games.loc[:, historical_games.columns.intersection(['country', 'league'])]
selection_db = selection_db.drop_duplicates()
selection_db = selection_db.sort_values(['country', 'league'], ascending=[True, True])
selection_db.loc[:, 'Win'] = 1.1
selection_db.loc[:, 'DNB'] = 0.7
selection_db.loc[:, 'O 1.5'] = 3.2
selection_db.loc[:, 'U 4.5'] = 2.2
ids = ['country', 'league']
selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()
selection_weights = selection_db
print(selection_weights.head())

  country                      league   Win   DNB  O 1.5  U 4.5
0  Africa       Africa Cup of Nations  3.68  1.86    5.2   1.45
1  Africa   Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2  Africa   Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3  Africa   Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4  Africa  African Championship Women  2.07  1.50    3.3   1.45

但是當我用modin運行它時，我得到一個不同且不正確的輸出

import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as pd

selection_db = historical_games.loc[:, historical_games.columns.intersection(['country', 'league'])]
selection_db = selection_db.drop_duplicates()
selection_db = selection_db.sort_values(['country', 'league'], ascending=[True, True])
selection_db.loc[:, 'Win'] = 1.1
selection_db.loc[:, 'DNB'] = 0.7
selection_db.loc[:, 'O 1.5'] = 3.2
selection_db.loc[:, 'U 4.5'] = 2.2
ids = ['country', 'league']
selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()
selection_weights = selection_db
print(selection_weights.head())

  country  league
0  Africa     2.2
1  Africa     2.2
2  Africa     2.2
3  Africa     2.2
4  Africa     2.2

問題是我必須將函數作為大型工作流程的一部分運行，並且當我在開始時導入 modin 時，它會按預期執行直到這部分代碼。

雖然我無法在代碼之間恢復為默認熊貓，或者我不知道如何在代碼之間更改庫。

我該如何解決這種情況？

Answer 1

@Harshad，來自 Modin GitHub 的這條評論描述了如何將 Modin 數據框轉換為 pandas：使用df._to_pandas() 。 一旦有了 pandas 數據框，就可以在其上調用任何 pandas 方法。 來自同一問題的其他評論描述了如何將 pandas 數據幀轉換回 Modin 數據幀：調用modin.pandas.DataFrame(pandas_dataframe) 。

關於您看到的 Modin 錯誤，我的猜測是您添加列的selection_db.loc[:, 'Win'] = 1.1之類的行會引發KeyError並且根本不會更改 Modin 數據框。 這是一個已知的 Modin 錯誤， https://github.com/modin-project/modin/issues/4354 。 例如，這適用於熊貓

import pandas
df = pandas.DataFrame([[1]])
df.loc[:, 'a'] = 3

但是如果我嘗試使用import modin.pandas as pandas和最新版本的 Modin 的相同腳本（提交 c1d5dbd71efb8fb5806fad41959794182780fc25），我得到 KeyError KeyError: array(['a'], dtype='<U1') 。 您是否有可能收到KeyError並忽略它？

與 Pandas 默認值相比，使用 modin 提供不同的結果

問題描述

1 個解決方案

解決方案1
0 2022-06-06 14:38:11

與 Pandas 默認值相比，使用 modin 提供不同的結果

問題描述

1 個解決方案

解決方案1 0 2022-06-06 14:38:11

解決方案1
0 2022-06-06 14:38:11