[英]KeyError: 0 when try to access a row in pandas DataFrame
[英]I got this error pandas._libs.index.Int64Engine._check_type KeyError: 'class' when I try to skip a row when loading a csv tabulated data file
import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
#df = pd.read_csv('coords.csv')
#df = pd.read_csv('coords.csv', header=None)
#df = pd.read_csv('coords.csv', skiprows=[0])
df = pd.read_csv('coords.csv', skiprows=[0], header=None)
#df[df['class']=='Happy']
X = df.drop('class', axis=1) # features
y = df['class'] # target value
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)
pipelines = {
'lr':make_pipeline(StandardScaler(), LogisticRegression()),
'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}
fit_models = {}
for algo, pipeline in pipelines.items():
model = pipeline.fit(X_train, y_train)
fit_models[algo] = model
fit_models['rc'].predict(X_test)
df = pd.read_csv('coords.csv')
如果我从 csv 读取整个数据数组,则第一行也会被读取,并且在尝试将 str 转换为 int 时会出错
Traceback (most recent call last):
File "3_Train_Custom_Model_Using_Scikit_Learn.py", line 71, in <module>
model = pipeline.fit(X_train, y_train)
ValueError: could not convert string to float: 'x1'
然后尝试各种方法来删除包含列名的行,这可能会产生错误。 因此,考虑到索引从 0 开始,我做了以下操作:
使用df = pd.read_csv('coords.csv', skiprows=0)
,给我ValueError: could not convert string to float: 'x1'
与
#df = pd.read_csv('coords.csv', header=None) #Option 1
#df = pd.read_csv('coords.csv', skiprows=[0], header=None) #Option 2
用 pandas 给我这个异常错误:
Traceback (most recent call last):
File "C:\Users\MyPC0\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index_class_helper.pxi", line 89, in pandas._libs.index.Int64Engine._check_type
KeyError: 'class'
我认为这个 pandas 错误是由于在省略与索引 0 关联的列名行时,Pandas 出于某种我不知道的原因试图“查找”该省略行的列,并且无法为此,它会引发该错误,该错误在控制台中看起来像是来自 Pandas 的异常。
“熊猫错误”甚至没有指示代码中的一行,我不知道它可能是什么,我怎么能解决它才能删除(虽然我真的只是跳过它)那行与列名并且能够用.fit() 训练model?
csv 文件在 Excel 中打开
csv 文件在文本编辑器中打开
我不确定问题是否出在 csv 本身,尽管我对此表示怀疑。 无论如何,这里我将用于加载 csv 中的数据的算法代码留下,以逗号作为分隔符
pose = results.pose_landmarks.landmark
pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())
face = results.face_landmarks.landmark
face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())
row = pose_row+face_row
row.insert(0, class_name)
with open('coords.csv', mode='a', newline='') as f:
csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(row)
根据您的记事本,您不能只使用pandas
阅读它,因为不清楚数据的 header 是什么,我假设这些数据只有 1 行(因为我看不到 ZA2F2ED4F8EBC2ABC1DDC44 行中的数据列表中的 np 数组可能会导致它)。 正确的 csv 示例: 您可以清楚地看到第一列表示数据正确分离,并且可以假设该数据有超过 1 行。 所以与你的 csv 不同。 所以你需要尝试pandas
来保存你的数据。 例如你的pose, face
数据有 n 对数据,添加所有数据的简单方法是我们可以使用循环添加到 dict:
import pandas as pd
data_pose = {}
ind = 0
for landmark in pose:
data_pose['x'+str(ind)] = landmark.x
data_pose['y'+str(ind)] = landmark.y
data_pose['z'+str(ind)] = landmark.z
data_pose['v'+str(ind)] = landmark.visibility
ind = ind+1
data_face = {}
ind = 0
for landmark in face:
data_face['xx'+str(ind)] = landmark.x
data_face['yy'+str(ind)] = landmark.y
data_face['zz'+str(ind)] = landmark.z
data_face['vv'+str(ind)] = landmark.visibility
ind = ind+1
data = {**data_pose,**data_face}
df = pd.DataFrames(data)
df.to_csv('try.csv',sep=';')
如果您想重新读取 csv 文件,只需这样做:
df = pd.read_csv('try.csv',sep=';')
df 的 header 将被设置为默认值,在这种情况下将占用 csv 文件的第一行。 它将修复您的ValueError: could not convert string to float: 'x1'
错误,因为此 header 将与您的数据分离。 记住要根据pose
和face
使差异变量,例如x
和xx
。 但我更喜欢在这种情况下使用多索引。
通过使用Python Pandas 的答案 用顶行替换 Header
import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
df = pd.read_csv('coo.csv')
df.rename(columns=df.iloc[0, :], inplace=True)
df.drop(df.index[0], inplace=True)
#df[df['class']=='Happy']
X = df.drop('class', axis=1) # features
y = df['class'] # target value
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)
pipelines = {
'lr':make_pipeline(StandardScaler(), LogisticRegression()),
'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}
fit_models = {}
for algo, pipeline in pipelines.items():
model = pipeline.fit(X_train, y_train)
fit_models[algo] = model
fit_models['rc'].predict(X_test)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.