繁体   English   中英

我在加载 csv 表格数据文件时尝试跳过一行时收到此错误 pandas._libs.index.Int64Engine._check_type KeyError: 'class'

[英]I got this error pandas._libs.index.Int64Engine._check_type KeyError: 'class' when I try to skip a row when loading a csv tabulated data file

import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models

from sklearn.pipeline import make_pipeline 
from sklearn.preprocessing import StandardScaler 

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

#df = pd.read_csv('coords.csv')
#df = pd.read_csv('coords.csv', header=None)
#df = pd.read_csv('coords.csv', skiprows=[0])
df = pd.read_csv('coords.csv', skiprows=[0], header=None)

#df[df['class']=='Happy']

X = df.drop('class', axis=1) # features
y = df['class'] # target value

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

fit_models = {}

for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model
fit_models['rc'].predict(X_test)
df = pd.read_csv('coords.csv')

如果我从 csv 读取整个数据数组,则第一行也会被读取,并且在尝试将 str 转换为 int 时会出错

Traceback (most recent call last):
  File "3_Train_Custom_Model_Using_Scikit_Learn.py", line 71, in <module>
    model = pipeline.fit(X_train, y_train)
ValueError: could not convert string to float: 'x1'

然后尝试各种方法来删除包含列名的行,这可能会产生错误。 因此,考虑到索引从 0 开始,我做了以下操作:

使用df = pd.read_csv('coords.csv', skiprows=0) ,给我ValueError: could not convert string to float: 'x1'

#df = pd.read_csv('coords.csv', header=None) #Option 1
#df = pd.read_csv('coords.csv', skiprows=[0], header=None) #Option 2

用 pandas 给我这个异常错误:

Traceback (most recent call last):
  File "C:\Users\MyPC0\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 89, in pandas._libs.index.Int64Engine._check_type
KeyError: 'class'

我认为这个 pandas 错误是由于在省略与索引 0 关联的列名行时,Pandas 出于某种我不知道的原因试图“查找”该省略行的列,并且无法为此,它会引发该错误,该错误在控制台中看起来像是来自 Pandas 的异常。

“熊猫错误”甚至没有指示代码中的一行,我不知道它可能是什么,我怎么能解决它才能删除(虽然我真的只是跳过它)那行与列名并且能够用.fit() 训练model?

csv 文件在 Excel 中打开

在此处输入图像描述

csv 文件在文本编辑器中打开

在此处输入图像描述

我不确定问题是否出在 csv 本身,尽管我对此表示怀疑。 无论如何,这里我将用于加载 csv 中的数据的算法代码留下,以逗号作为分隔符

pose = results.pose_landmarks.landmark

pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())

face = results.face_landmarks.landmark

face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())

row = pose_row+face_row
row.insert(0, class_name)

with open('coords.csv', mode='a', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(row) 

根据您的记事本,您不能只使用pandas阅读它,因为不清楚数据的 header 是什么,我假设这些数据只有 1 行(因为我看不到 ZA2F2ED4F8EBC2ABC1DDC44 行中的数据列表中的 np 数组可能会导致它)。 正确的 csv 示例: 在此处输入图像描述 您可以清楚地看到第一列表示数据正确分离,并且可以假设该数据有超过 1 行。 所以与你的 csv 不同。 所以你需要尝试pandas来保存你的数据。 例如你的pose, face数据有 n 对数据,添加所有数据的简单方法是我们可以使用循环添加到 dict:

import pandas as pd
data_pose = {}
ind = 0
for landmark in pose:
    data_pose['x'+str(ind)] = landmark.x
    data_pose['y'+str(ind)] = landmark.y
    data_pose['z'+str(ind)] = landmark.z
    data_pose['v'+str(ind)] = landmark.visibility
    ind = ind+1
data_face = {}
ind = 0
for landmark in face:
    data_face['xx'+str(ind)] = landmark.x
    data_face['yy'+str(ind)] = landmark.y
    data_face['zz'+str(ind)] = landmark.z
    data_face['vv'+str(ind)] = landmark.visibility
    ind = ind+1
data = {**data_pose,**data_face}
df = pd.DataFrames(data)
df.to_csv('try.csv',sep=';')

如果您想重新读取 csv 文件,只需这样做:

df = pd.read_csv('try.csv',sep=';')

df 的 header 将被设置为默认值,在这种情况下将占用 csv 文件的第一行。 它将修复您的ValueError: could not convert string to float: 'x1'错误,因为此 header 将与您的数据分离。 记住要根据poseface使差异变量,例如xxx 但我更喜欢在这种情况下使用多索引。

通过使用Python Pandas 的答案 用顶行替换 Header

import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models

from sklearn.pipeline import make_pipeline 
from sklearn.preprocessing import StandardScaler 

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

df = pd.read_csv('coo.csv')

df.rename(columns=df.iloc[0, :], inplace=True)
df.drop(df.index[0], inplace=True)


#df[df['class']=='Happy']

X = df.drop('class', axis=1) # features
y = df['class'] # target value

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

fit_models = {}

for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model
fit_models['rc'].predict(X_test)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM