[英]AssertionError when use df.loc in python
我創建了一個腳本來加載數據、檢查 NA 值並填充所有 NA 值。 這是我的代碼:
import pandas as pd
def filter_df(merged_df, var_list):
ind = merged_df.Name.isin(var_list)
return merged_df[ind]
def pivot_df(df):
return df.pivot(index='Date', columns='Name', values=['Open', 'High', 'Low', 'Close'])
def validation_df(input, summary = False):
df = input.copy()
# na check
missing = df.isna().sum().sort_values(ascending=False)
percent_missing = ((missing / df.isnull().count()) * 100).sort_values(ascending=False)
missing_df = pd.concat([missing, percent_missing], axis=1, keys=['Total', 'Percent'], sort=False)
# fill na
columns = list(missing_df[missing_df['Total'] >= 1].reset_index()['index'])
for col in columns:
null_index = df.index[df[col].isnull() == True].tolist()
null_index.sort()
for ind in null_index:
if ind > 0:
print(df.loc[ind, col])
print(df.loc[ind - 1, col])
df.loc[ind, col] = df.loc[ind - 1, col]
if ind == 0:
df.loc[ind, col] = 0
# outliers check
count = []
for col in df.columns:
count.append(sum(df[col] > df[col].mean() + 2 * df[col].std()) + sum(df[col] < df[col].mean() - 2 * df[col].std()))
outliers_df = pd.DataFrame({'Columns': df.columns, 'Count': count}).sort_values(by = 'Count')
if summary == True:
print('missing value check:/n')
print(missing_df)
print('/n outliers check:/n')
print(outliers_df)
return df
def join_df(price_df, transaction_df, var_list):
price_df = filter_df(price_df, var_list)
price_df = pivot_df(price_df)
joined_df = transaction_df.merge(price_df, how = 'left', on = 'Date')
#joined_df = validation_df(joined_df)
return joined_df
token_path = 'https://raw.githubusercontent.com/Carloszone/Cryptocurrency_Research_project/main/datasets/1_token_df.csv'
transaction_path = 'https://raw.githubusercontent.com/Carloszone/Cryptocurrency_Research_project/main/datasets/transaction_df.csv'
var_list = ['Bitcoin', 'Ethereum', 'Golem', 'Solana']
token_df = pd.read_csv(token_path)
transaction_df = pd.read_csv(transaction_path)
df = join_df(token_df, transaction_df, var_list)
df = validation_df(df)
但它沒有用。 我檢查了我的代碼,發現這個問題來自 loc()。 例如:
df = join_df(token_df, transaction_df, var_list)
print(df[df.columns[15]])
print(df.loc[1,df.columns[15]])
我得到的是:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
2250 NaN
2251 NaN
2252 NaN
2253 NaN
2254 NaN
Name: (High, Solana), Length: 2255, dtype: float64
AssertionError Traceback (most recent call last)
<ipython-input-19-75f01cc22c9c> in <module>()
2
3 print(df[df.columns[15]])
----> 4 print(df.loc[1,df.columns[15]])
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
923 with suppress(KeyError, IndexError):
924 return self.obj._get_value(*key, takeable=self._takeable)
--> 925 return self._getitem_tuple(key)
926 else:
927 # we by definition only have the 0th axis
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1107 return self._multi_take(tup)
1108
-> 1109 return self._getitem_tuple_same_dim(tup)
1110
1111 def _get_label(self, label, axis: int):
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
807 # We should never have retval.ndim < self.ndim, as that should
808 # be handled by the _getitem_lowerdim call above.
--> 809 assert retval.ndim == self.ndim
810
811 return retval
AssertionError:
我不知道為什么 df[column_name] 可用,但 df.loc[index,columns_name] 是錯誤的。
你可以在 Colab 上查看我的代碼: https://colab.research.google.com/drive/1Yg280JRwFayW1tdp4OJqTO5-X3dGsItB?usp=sharing
問題是您在一個不共享的列上合並兩個 DataFrame(因為您旋轉price_df
, Date 列成為索引)。 此外,日期列沒有統一的格式,因此您必須使它們相同。 將您的join_df
function 替換為下面的那個,它將按預期工作。
我在必須添加的行上添加了注釋。
def join_df(price_df, transaction_df, var_list):
price_df = filter_df(price_df, var_list)
price_df = pivot_df(price_df)
# After pivot the Date column is the index, and price_df has MultiIndex columns
# since we want to merge it with transaction_df, we need to first flatten the columns
price_df.columns = price_df.columns.map('.'.join)
# and reset_index so that we have the index as the Date column
price_df = price_df.reset_index()
# the Dates are formatted differently across the two DataFrames;
# one has the following format: '2016-01-01' and the other '2016/1/1'
# to have a uniform format, we convert the both Date columns to datetime objects
price_df['Date'] = pd.to_datetime(price_df['Date'])
transaction_df['Date'] = pd.to_datetime(transaction_df['Date'])
joined_df = transaction_df.merge(price_df, how = 'left', on = 'Date')
#joined_df = validation_df(joined_df)
return joined_df
Output:
Date total_transaction_count Volume gas_consumption \
0 2016-01-01 2665 NaN NaN
1 2016-01-02 4217 NaN NaN
2 2016-01-03 4396 NaN NaN
3 2016-01-04 4776 NaN NaN
4 2016-01-05 26649 NaN NaN
... ... ... ... ...
2250 2022-02-28 1980533 1.968686e+06 8.626201e+11
2251 2022-03-01 2013145 2.194055e+06 1.112079e+12
2252 2022-03-02 1987934 2.473327e+06 1.167615e+12
2253 2022-03-03 1973190 3.093248e+06 1.260826e+12
2254 2022-03-04 1861286 4.446204e+06 1.045814e+12
old_ave_gas_fee new_avg_gas_fee new_avg_base_fee \
0 0.000000e+00 0.000000e+00 0.000000e+00
1 0.000000e+00 0.000000e+00 0.000000e+00
2 0.000000e+00 0.000000e+00 0.000000e+00
3 0.000000e+00 0.000000e+00 0.000000e+00
4 0.000000e+00 0.000000e+00 0.000000e+00
... ... ... ...
2250 6.356288e-08 6.356288e-08 5.941877e-08
2251 5.368574e-08 5.368574e-08 4.982823e-08
2252 5.567472e-08 5.567472e-08 4.782055e-08
2253 4.763823e-08 4.763823e-08 4.140883e-08
2254 4.566440e-08 4.566440e-08 3.547666e-08
new_avg_priority_fee Open.Bitcoin Open.Ethereum ... High.Golem \
0 0.000000e+00 430.0 NaN ... NaN
1 0.000000e+00 434.0 NaN ... NaN
2 0.000000e+00 433.7 NaN ... NaN
3 0.000000e+00 430.7 NaN ... NaN
4 0.000000e+00 433.3 NaN ... NaN
... ... ... ... ... ...
2250 4.144109e-09 37707.2 2616.34 ... 0.48904
2251 3.857517e-09 43187.2 2922.44 ... 0.48222
2252 7.854179e-09 44420.3 2975.80 ... 0.47550
2253 6.229401e-09 NaN NaN ... NaN
2254 1.018774e-08 NaN NaN ... NaN
High.Solana Low.Bitcoin Low.Ethereum Low.Golem Low.Solana \
0 NaN 425.9 NaN NaN NaN
1 NaN 430.7 NaN NaN NaN
2 NaN 423.1 NaN NaN NaN
3 NaN 428.6 NaN NaN NaN
4 NaN 428.9 NaN NaN NaN
... ... ... ... ... ...
2250 NaN 37458.9 2574.12 0.41179 NaN
2251 NaN 42876.6 2858.54 0.45093 NaN
2252 NaN 43361.3 2914.70 0.43135 NaN
2253 NaN NaN NaN NaN NaN
2254 NaN NaN NaN NaN NaN
Close.Bitcoin Close.Ethereum Close.Golem Close.Solana
0 434.0 NaN NaN NaN
1 433.7 NaN NaN NaN
2 430.7 NaN NaN NaN
3 433.3 NaN NaN NaN
4 431.2 NaN NaN NaN
... ... ... ... ...
2250 43188.2 2922.50 0.47748 NaN
2251 44420.3 2975.81 0.47447 NaN
2252 43853.2 2952.47 0.43964 NaN
2253 NaN NaN NaN NaN
2254 NaN NaN NaN NaN
[2255 rows x 24 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.