[英]Reading CSV & Columns - KeyError: “None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]”
I am having issues trying to generate a colinearity analysis on a simple DF (see below).我在尝试对简单的 DF 生成共线性分析时遇到问题(见下文)。 My problem is that everytime I try to run the function, I retrieve the following error message:我的问题是每次我尝试运行 function 时,都会检索到以下错误消息:
KeyError: "None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]"
Below is the code I am using下面是我正在使用的代码
read_training_set = pd.read_csv('C:\\Users\\rapha\\Desktop\\New test\\Classeur1.csv', sep=";")
training_set = pd.DataFrame(read_training_set)
print(training_set)
def calculate_vif_(X):
thresh = 5.0
variables = range(X.shape[1])
for i in np.arange(0, len(variables)):
vif = [variance_inflation_factor(X[variables].values, ix) for ix in range(X[variables].shape[1])]
print(vif)
maxloc = vif.index(max(vif))
if max(vif) > thresh:
print('dropping \'' + X[variables].columns[maxloc] + '\' at index: ' + str(maxloc))
del variables[maxloc]
print('Remaining variables:')
print(X.columns[variables])
return X
X = training_set
X2 = calculate_vif_(X)
The DF on which I am trying to run my function looks like this.我试图在其上运行 function 的 DF 看起来像这样。
Year Age Weight Size
0 2020 10 100 170
1 2021 11 101 171
2 2022 12 102 172
3 2023 13 103 173
4 2024 14 104 174
5 2025 15 105 175
6 2026 16 106 176
7 2027 17 107 177
8 2028 18 108 178
I have two guesses here;我在这里有两个猜测; but not sure how to fix that anyway:但不知道如何解决这个问题:
-Guess 1: the np.arrange is causing some sort of conflict with the header & columns which prevents the rest of the function of iterating through each column -猜测1:np.arrange 与 header 和列发生某种冲突,这会阻止 function 的 rest 遍历它的每一列
-Guess 2: The problem comes from blankseperators, which prevents the function from jumping from one column to another properly. -猜测2:问题来自空白分隔符,它阻止了function 正确地从一列跳到另一列。 The problem is that my CSV file already has ";"问题是我的 CSV 文件已经有“;” seperators (I do not know exactly why to be honnest as I manually created the file and saved it as a regular CSV with "," separators").分隔符(我不知道为什么要诚实,因为我手动创建了文件并将其保存为带有“,”分隔符的常规 CSV)。
Not sure how to fix the problem at this point, does anyone has insights here?目前不知道如何解决这个问题,这里有没有人有见解?
Best最好的
The error is caused by this snippet X[variables].values
.该错误是由此代码段X[variables].values
的。 Convert variables
, which is a range
, to a list
.将variables
( range
)转换为list
。
As an aside, the code is very confusing.顺便说一句,代码非常混乱。 Why are you calling np.arange
when variables
is already a range
?当variables
已经是一个range
时,为什么要调用np.arange
? Why are you using a range of the number of columns to index rows?为什么要使用一定范围的列数来索引行?
It looks like from the comments above that you think you are indexing columns by column number, but you are actually indexing rows.从上面的评论看来,您认为您是按列号索引列,但实际上您是在索引行。 Some of this confusion would be cleared up if you use loc`` or
iloc``` to be explicit about what you are trying to index.如果您使用loc`` or
iloc``` 来明确说明您要索引的内容,则可以消除一些混乱。
Got it, I revised the whole thing and seems to be working.明白了,我修改了整个事情,似乎正在工作。 See below how it looks.请参阅下面的外观。
Thanks a lot for the help非常感谢您的帮助
variables = list(range(X.shape[1]))
for i in variables:
vif = [variance_inflation_factor(X.iloc[:, variables].values, ix)
for ix in range(X.iloc[:, variables].shape[1])]
maxloc = vif.index(max(vif))
if max(vif) > thresh:
print('dropping \'' + X.iloc[:, variables].columns[maxloc] +
'\' at index: ' + str(maxloc))
del variables[maxloc]
print('Remaining variables:')
print(X.columns[variables])
return X.iloc[:, variables]
X = training_set
X2 = calculate_vif_(X)```
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.