我在加载 csv 表格数据文件时尝试跳过一行时收到此错误 pandas._libs.index.Int64Engine._check_type KeyError: 'class'

Question

import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models

from sklearn.pipeline import make_pipeline 
from sklearn.preprocessing import StandardScaler 

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

#df = pd.read_csv('coords.csv')
#df = pd.read_csv('coords.csv', header=None)
#df = pd.read_csv('coords.csv', skiprows=[0])
df = pd.read_csv('coords.csv', skiprows=[0], header=None)

#df[df['class']=='Happy']

X = df.drop('class', axis=1) # features
y = df['class'] # target value

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

fit_models = {}

for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model
fit_models['rc'].predict(X_test)

df = pd.read_csv('coords.csv')

如果我从 csv 读取整个数据数组，则第一行也会被读取，并且在尝试将 str 转换为 int 时会出错

Traceback (most recent call last):
  File "3_Train_Custom_Model_Using_Scikit_Learn.py", line 71, in <module>
    model = pipeline.fit(X_train, y_train)
ValueError: could not convert string to float: 'x1'

然后尝试各种方法来删除包含列名的行，这可能会产生错误。 因此，考虑到索引从 0 开始，我做了以下操作：

使用df = pd.read_csv('coords.csv', skiprows=0) ，给我ValueError: could not convert string to float: 'x1'

与

#df = pd.read_csv('coords.csv', header=None) #Option 1
#df = pd.read_csv('coords.csv', skiprows=[0], header=None) #Option 2

用 pandas 给我这个异常错误：

Traceback (most recent call last):
  File "C:\Users\MyPC0\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 89, in pandas._libs.index.Int64Engine._check_type
KeyError: 'class'

我认为这个 pandas 错误是由于在省略与索引 0 关联的列名行时，Pandas 出于某种我不知道的原因试图“查找”该省略行的列，并且无法为此，它会引发该错误，该错误在控制台中看起来像是来自 Pandas 的异常。

“熊猫错误”甚至没有指示代码中的一行，我不知道它可能是什么，我怎么能解决它才能删除（虽然我真的只是跳过它）那行与列名并且能够用.fit() 训练model？

csv 文件在 Excel 中打开

csv 文件在文本编辑器中打开

我不确定问题是否出在 csv 本身，尽管我对此表示怀疑。 无论如何，这里我将用于加载 csv 中的数据的算法代码留下，以逗号作为分隔符

pose = results.pose_landmarks.landmark

pose_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in pose]).flatten())

face = results.face_landmarks.landmark

face_row = list(np.array([[landmark.x, landmark.y, landmark.z, landmark.visibility] for landmark in face]).flatten())

row = pose_row+face_row
row.insert(0, class_name)

with open('coords.csv', mode='a', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(row)

Answer 1

根据您的记事本，您不能只使用pandas阅读它，因为不清楚数据的 header 是什么，我假设这些数据只有 1 行（因为我看不到 ZA2F2ED4F8EBC2ABC1DDC44 行中的数据列表中的 np 数组可能会导致它）。 正确的 csv 示例： 您可以清楚地看到第一列表示数据正确分离，并且可以假设该数据有超过 1 行。 所以与你的 csv 不同。 所以你需要尝试pandas来保存你的数据。 例如你的pose, face数据有 n 对数据，添加所有数据的简单方法是我们可以使用循环添加到 dict：

import pandas as pd
data_pose = {}
ind = 0
for landmark in pose:
    data_pose['x'+str(ind)] = landmark.x
    data_pose['y'+str(ind)] = landmark.y
    data_pose['z'+str(ind)] = landmark.z
    data_pose['v'+str(ind)] = landmark.visibility
    ind = ind+1
data_face = {}
ind = 0
for landmark in face:
    data_face['xx'+str(ind)] = landmark.x
    data_face['yy'+str(ind)] = landmark.y
    data_face['zz'+str(ind)] = landmark.z
    data_face['vv'+str(ind)] = landmark.visibility
    ind = ind+1
data = {**data_pose,**data_face}
df = pd.DataFrames(data)
df.to_csv('try.csv',sep=';')

如果您想重新读取 csv 文件，只需这样做：

df = pd.read_csv('try.csv',sep=';')

df 的 header 将被设置为默认值，在这种情况下将占用 csv 文件的第一行。 它将修复您的ValueError: could not convert string to float: 'x1'错误，因为此 header 将与您的数据分离。 记住要根据pose和face使差异变量，例如x和xx 。 但我更喜欢在这种情况下使用多索引。

Answer 2

通过使用Python Pandas 的答案用顶行替换 Header

import pandas as pd #pandas working with tabular data as dataframes
from sklearn.model_selection import train_test_split #scikit-learn, building custom ML models

from sklearn.pipeline import make_pipeline 
from sklearn.preprocessing import StandardScaler 

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

df = pd.read_csv('coo.csv')

df.rename(columns=df.iloc[0, :], inplace=True)
df.drop(df.index[0], inplace=True)


#df[df['class']=='Happy']

X = df.drop('class', axis=1) # features
y = df['class'] # target value

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1234)

pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rc':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

fit_models = {}

for algo, pipeline in pipelines.items():
    model = pipeline.fit(X_train, y_train)
    fit_models[algo] = model
fit_models['rc'].predict(X_test)

我在加载 csv 表格数据文件时尝试跳过一行时收到此错误 pandas._libs.index.Int64Engine._check_type KeyError: 'class'

问题描述

2 个解决方案

解决方案1
0 2022-01-05 05:13:09

解决方案2
0 2022-01-05 05:56:36

我在加载 csv 表格数据文件时尝试跳过一行时收到此错误 pandas._libs.index.Int64Engine._check_type KeyError: 'class'

问题描述

2 个解决方案

解决方案1 0 2022-01-05 05:13:09

解决方案2 0 2022-01-05 05:56:36

解决方案1
0 2022-01-05 05:13:09

解决方案2
0 2022-01-05 05:56:36