读取 CSV & Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中”

Question

I am having issues trying to generate a colinearity analysis on a simple DF (see below).我在尝试对简单的 DF 生成共线性分析时遇到问题（见下文）。 My problem is that everytime I try to run the function, I retrieve the following error message:我的问题是每次我尝试运行 function 时，都会检索到以下错误消息：

KeyError: "None of [Int64Index([0, 1, 2, 3], dtype='int64')] are in the [columns]"

Below is the code I am using下面是我正在使用的代码

read_training_set = pd.read_csv('C:\\Users\\rapha\\Desktop\\New test\\Classeur1.csv', sep=";")
training_set = pd.DataFrame(read_training_set)

print(training_set)

def calculate_vif_(X):
    thresh = 5.0
    variables = range(X.shape[1])

    for i in np.arange(0, len(variables)):
        vif = [variance_inflation_factor(X[variables].values, ix) for ix in range(X[variables].shape[1])]
        print(vif)

        maxloc = vif.index(max(vif))
        if max(vif) > thresh:
            print('dropping \'' + X[variables].columns[maxloc] + '\' at index: ' + str(maxloc))
            del variables[maxloc]

    print('Remaining variables:')
    print(X.columns[variables])
    return X

X = training_set
X2 = calculate_vif_(X)

The DF on which I am trying to run my function looks like this.我试图在其上运行 function 的 DF 看起来像这样。

   Year  Age  Weight  Size
0  2020   10     100   170
1  2021   11     101   171
2  2022   12     102   172
3  2023   13     103   173
4  2024   14     104   174
5  2025   15     105   175
6  2026   16     106   176
7  2027   17     107   177
8  2028   18     108   178

I have two guesses here;我在这里有两个猜测； but not sure how to fix that anyway:但不知道如何解决这个问题：

-Guess 1: the np.arrange is causing some sort of conflict with the header & columns which prevents the rest of the function of iterating through each column -猜测1：np.arrange 与 header 和列发生某种冲突，这会阻止 function 的 rest 遍历它的每一列

-Guess 2: The problem comes from blankseperators, which prevents the function from jumping from one column to another properly. -猜测2：问题来自空白分隔符，它阻止了function 正确地从一列跳到另一列。 The problem is that my CSV file already has ";"问题是我的 CSV 文件已经有“;” seperators (I do not know exactly why to be honnest as I manually created the file and saved it as a regular CSV with "," separators").分隔符（我不知道为什么要诚实，因为我手动创建了文件并将其保存为带有“，”分隔符的常规 CSV）。

Not sure how to fix the problem at this point, does anyone has insights here?目前不知道如何解决这个问题，这里有没有人有见解？

Best最好的

Answer 1

The error is caused by this snippet X[variables].values .该错误是由此代码段X[variables].values的。 Convert variables , which is a range , to a list .将variables （ range ）转换为list 。

As an aside, the code is very confusing.顺便说一句，代码非常混乱。 Why are you calling np.arange when variables is already a range ?当variables已经是一个range时，为什么要调用np.arange ？ Why are you using a range of the number of columns to index rows?为什么要使用一定范围的列数来索引行？

It looks like from the comments above that you think you are indexing columns by column number, but you are actually indexing rows.从上面的评论看来，您认为您是按列号索引列，但实际上您是在索引行。 Some of this confusion would be cleared up if you use loc`` or iloc``` to be explicit about what you are trying to index.如果您使用loc`` or iloc``` 来明确说明您要索引的内容，则可以消除一些混乱。

Answer 2

Got it, I revised the whole thing and seems to be working.明白了，我修改了整个事情，似乎正在工作。 See below how it looks.请参阅下面的外观。

Thanks a lot for the help非常感谢您的帮助

    variables = list(range(X.shape[1]))

    for i in variables:
        vif = [variance_inflation_factor(X.iloc[:, variables].values, ix)
               for ix in range(X.iloc[:, variables].shape[1])]

        maxloc = vif.index(max(vif))
        if max(vif) > thresh:
            print('dropping \'' + X.iloc[:, variables].columns[maxloc] +
                  '\' at index: ' + str(maxloc))
            del variables[maxloc]

    print('Remaining variables:')
    print(X.columns[variables])
    return X.iloc[:, variables]


X = training_set
X2 = calculate_vif_(X)```

读取 CSV & Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中”

问题描述

2 个解决方案

解决方案1
1 2020-04-19 13:52:38

解决方案2
0 已采纳 2020-04-19 14:34:06

读取 CSV &amp; Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中”

问题描述

2 个解决方案

解决方案1 1 2020-04-19 13:52:38

解决方案2 0 已采纳 2020-04-19 14:34:06

读取 CSV & Columns - KeyError: “[Int64Index([0, 1, 2, 3], dtype='int64')] 都在 [columns] 中”

解决方案1
1 2020-04-19 13:52:38

解决方案2
0 已采纳 2020-04-19 14:34:06