簡體   English   中英

在 python 中使用 df.loc 時出現 AssertionError

[英]AssertionError when use df.loc in python

我創建了一個腳本來加載數據、檢查 NA 值並填充所有 NA 值。 這是我的代碼:

import pandas as pd

def filter_df(merged_df, var_list):
    ind = merged_df.Name.isin(var_list)
    return merged_df[ind]

def pivot_df(df):
    return df.pivot(index='Date', columns='Name', values=['Open', 'High', 'Low', 'Close'])

def validation_df(input, summary = False):
    df = input.copy()
    # na check
    missing = df.isna().sum().sort_values(ascending=False)
    percent_missing = ((missing / df.isnull().count()) * 100).sort_values(ascending=False)
    missing_df = pd.concat([missing, percent_missing], axis=1, keys=['Total', 'Percent'], sort=False)

    # fill na
    columns = list(missing_df[missing_df['Total'] >= 1].reset_index()['index'])

    for col in columns:
        null_index = df.index[df[col].isnull() == True].tolist()
        null_index.sort()
        for ind in null_index:
            if ind > 0:
                print(df.loc[ind, col])
                print(df.loc[ind - 1, col])
                df.loc[ind, col] = df.loc[ind - 1, col]
            if ind == 0:
                df.loc[ind, col] = 0

    # outliers check
    count = []
    for col in df.columns:
        count.append(sum(df[col] > df[col].mean() + 2 * df[col].std()) + sum(df[col] < df[col].mean() - 2 * df[col].std()))
    outliers_df = pd.DataFrame({'Columns': df.columns, 'Count': count}).sort_values(by = 'Count')

    if summary == True:
        print('missing value check:/n')
        print(missing_df)
        print('/n outliers check:/n')
        print(outliers_df)

    return df

def join_df(price_df, transaction_df, var_list):

    price_df = filter_df(price_df, var_list)
    price_df = pivot_df(price_df)

    joined_df = transaction_df.merge(price_df, how = 'left', on = 'Date')
    #joined_df = validation_df(joined_df)

    return joined_df

token_path = 'https://raw.githubusercontent.com/Carloszone/Cryptocurrency_Research_project/main/datasets/1_token_df.csv'
transaction_path = 'https://raw.githubusercontent.com/Carloszone/Cryptocurrency_Research_project/main/datasets/transaction_df.csv'

var_list = ['Bitcoin', 'Ethereum', 'Golem', 'Solana']

token_df = pd.read_csv(token_path)
transaction_df = pd.read_csv(transaction_path)

df = join_df(token_df, transaction_df, var_list)
df = validation_df(df)

但它沒有用。 我檢查了我的代碼,發現這個問題來自 loc()。 例如:

df = join_df(token_df, transaction_df, var_list)

print(df[df.columns[15]])
print(df.loc[1,df.columns[15]])

我得到的是:

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
        ..
2250   NaN
2251   NaN
2252   NaN
2253   NaN
2254   NaN
Name: (High, Solana), Length: 2255, dtype: float64

AssertionError                            Traceback (most recent call last)
<ipython-input-19-75f01cc22c9c> in <module>()
      2 
      3 print(df[df.columns[15]])
----> 4 print(df.loc[1,df.columns[15]])

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1107             return self._multi_take(tup)
   1108 
-> 1109         return self._getitem_tuple_same_dim(tup)
   1110 
   1111     def _get_label(self, label, axis: int):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
    807             # We should never have retval.ndim < self.ndim, as that should
    808             #  be handled by the _getitem_lowerdim call above.
--> 809             assert retval.ndim == self.ndim
    810 
    811         return retval

AssertionError: 

我不知道為什么 df[column_name] 可用,但 df.loc[index,columns_name] 是錯誤的。

你可以在 Colab 上查看我的代碼: https://colab.research.google.com/drive/1Yg280JRwFayW1tdp4OJqTO5-X3dGsItB?usp=sharing

問題是您在一個不共享的列上合並兩個 DataFrame(因為您旋轉price_df , Date 列成為索引)。 此外,日期列沒有統一的格式,因此您必須使它們相同。 將您的join_df function 替換為下面的那個,它將按預期工作。

我在必須添加的行上添加了注釋。

def join_df(price_df, transaction_df, var_list):

    price_df = filter_df(price_df, var_list)
    price_df = pivot_df(price_df)

    # After pivot the Date column is the index, and price_df has MultiIndex columns
    # since we want to merge it with transaction_df, we need to first flatten the columns
    price_df.columns = price_df.columns.map('.'.join)

    # and reset_index so that we have the index as the Date column
    price_df = price_df.reset_index()

    # the Dates are formatted differently across the two DataFrames; 
    # one has the following format: '2016-01-01' and the other '2016/1/1'
    # to have a uniform format, we convert the both Date columns to datetime objects
    price_df['Date'] = pd.to_datetime(price_df['Date'])
    transaction_df['Date'] = pd.to_datetime(transaction_df['Date'])

    joined_df = transaction_df.merge(price_df, how = 'left', on = 'Date')
    #joined_df = validation_df(joined_df)

    return joined_df

Output:

           Date  total_transaction_count        Volume  gas_consumption  \
0    2016-01-01                     2665           NaN              NaN   
1    2016-01-02                     4217           NaN              NaN   
2    2016-01-03                     4396           NaN              NaN   
3    2016-01-04                     4776           NaN              NaN   
4    2016-01-05                    26649           NaN              NaN   
...         ...                      ...           ...              ...   
2250 2022-02-28                  1980533  1.968686e+06     8.626201e+11   
2251 2022-03-01                  2013145  2.194055e+06     1.112079e+12   
2252 2022-03-02                  1987934  2.473327e+06     1.167615e+12   
2253 2022-03-03                  1973190  3.093248e+06     1.260826e+12   
2254 2022-03-04                  1861286  4.446204e+06     1.045814e+12   

      old_ave_gas_fee  new_avg_gas_fee  new_avg_base_fee  \
0        0.000000e+00     0.000000e+00      0.000000e+00   
1        0.000000e+00     0.000000e+00      0.000000e+00   
2        0.000000e+00     0.000000e+00      0.000000e+00   
3        0.000000e+00     0.000000e+00      0.000000e+00   
4        0.000000e+00     0.000000e+00      0.000000e+00   
...               ...              ...               ...   
2250     6.356288e-08     6.356288e-08      5.941877e-08   
2251     5.368574e-08     5.368574e-08      4.982823e-08   
2252     5.567472e-08     5.567472e-08      4.782055e-08   
2253     4.763823e-08     4.763823e-08      4.140883e-08   
2254     4.566440e-08     4.566440e-08      3.547666e-08   

      new_avg_priority_fee  Open.Bitcoin  Open.Ethereum  ...  High.Golem  \
0             0.000000e+00         430.0            NaN  ...         NaN   
1             0.000000e+00         434.0            NaN  ...         NaN   
2             0.000000e+00         433.7            NaN  ...         NaN   
3             0.000000e+00         430.7            NaN  ...         NaN   
4             0.000000e+00         433.3            NaN  ...         NaN   
...                    ...           ...            ...  ...         ...   
2250          4.144109e-09       37707.2        2616.34  ...     0.48904   
2251          3.857517e-09       43187.2        2922.44  ...     0.48222   
2252          7.854179e-09       44420.3        2975.80  ...     0.47550   
2253          6.229401e-09           NaN            NaN  ...         NaN   
2254          1.018774e-08           NaN            NaN  ...         NaN   

      High.Solana  Low.Bitcoin  Low.Ethereum  Low.Golem  Low.Solana  \
0             NaN        425.9           NaN        NaN         NaN   
1             NaN        430.7           NaN        NaN         NaN   
2             NaN        423.1           NaN        NaN         NaN   
3             NaN        428.6           NaN        NaN         NaN   
4             NaN        428.9           NaN        NaN         NaN   
...           ...          ...           ...        ...         ...   
2250          NaN      37458.9       2574.12    0.41179         NaN   
2251          NaN      42876.6       2858.54    0.45093         NaN   
2252          NaN      43361.3       2914.70    0.43135         NaN   
2253          NaN          NaN           NaN        NaN         NaN   
2254          NaN          NaN           NaN        NaN         NaN   

      Close.Bitcoin  Close.Ethereum  Close.Golem  Close.Solana  
0             434.0             NaN          NaN           NaN  
1             433.7             NaN          NaN           NaN  
2             430.7             NaN          NaN           NaN  
3             433.3             NaN          NaN           NaN  
4             431.2             NaN          NaN           NaN  
...             ...             ...          ...           ...  
2250        43188.2         2922.50      0.47748           NaN  
2251        44420.3         2975.81      0.47447           NaN  
2252        43853.2         2952.47      0.43964           NaN  
2253            NaN             NaN          NaN           NaN  
2254            NaN             NaN          NaN           NaN  

[2255 rows x 24 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM