Python 索引错误 - 轴 0 超出范围

Question

I have dataset like the following in the txt file.我在 txt 文件中有如下数据集。 (First column is=userid, second column is=locationid) Normally my dataset is big but I created a dummy dataset to better explain my problem. （第一列=userid，第二列=locationid）通常我的数据集很大，但我创建了一个虚拟数据集来更好地解释我的问题。

I'm trying to create a matrix like in the code below.我正在尝试创建一个矩阵，如下面的代码所示。 row will be userid column location id.行将是用户 ID 列位置 ID。 Since this dataset shows the location ids visited by the users, I assign the value 1 in the code to the locations they visited in the matrix.由于该数据集显示了用户访问的位置 ID，因此我将代码中的值 1 分配给他们在矩阵中访问的位置。

I am getting an indexerror.我收到一个索引错误。 IndexError: index 801 is out of bounds for axis 0 with size 50

I tried different user_num and poi_num but still doesn't work我尝试了不同的 user_num 和 poi_num 但仍然不起作用

datausers.txt数据用户.txt

Code代码

import numpy as np
from collections import defaultdict
from itertools import islice
import pandas as pd 

train_file = "datausers.txt"
user_num = 20
poi_num = 20

training_matrix = np.zeros((user_num, poi_num))
train_data = list(islice(open(train_file, 'r'), 10))

for eachline in train_data:
    uid, lid= eachline.strip().split()
    uid, lid = int(uid), int(lid)
    training_matrix[uid, lid] = 1.0

Error错误

Expected Output预计 Output

4x12 Matrix because we have 4 unique users and 12 unique location 4x12 矩阵，因为我们有 4 个唯一用户和 12 个唯一位置

[1 0 1 0 1 0 0 0 0 0 0 0
 0 1 0 1 0 1 0 0 0 0 0 0
...
]

For example for first row 1 0 1 0 1 0 0 0 0 0 0 0例如对于第一行 1 0 1 0 1 0 0 0 0 0 0 0

User 801 visited 3 locations and those are 1. (The location of the 1's can be variable I gave it to be an example)用户 801 访问了 3 个位置，它们是 1。（1 的位置可以是可变的，我以它为例）

Answer 1

As you have tagged the question with pandas , here is one way of approaching the problem with str.get_dummies method of the pandas Series :正如您使用pandas标记问题一样，这是使用 pandas Series的str.get_dummies方法解决问题的一种方法：

df = pd.read_csv('datausers.txt', sep='\s+', names=['userid', 'locationid'], index_col=0)
out = df['locationid'].astype(str).str.get_dummies().sum(level=0)

Result结果

For the sample data对于样本数据

>>> out
        10201  10259  14470  15810  19264  32332  33847  34041  34827  34834  35407  36115
userid                                                                                    
801         0      0      1      0      0      1      1      0      0      0      0      0
501         1      1      0      0      0      0      0      1      0      0      0      0
301         0      0      0      1      1      0      0      0      1      0      0      0
401         0      0      0      0      0      0      0      0      0      1      1      1

If you need numpy array instead:如果您需要numpy阵列代替：

>>> out.to_numpy()

array([[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])

Python 索引错误 - 轴 0 超出范围

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-03-14 17:07:53

Python 索引错误 - 轴 0 超出范围

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-03-14 17:07:53

解决方案1
3 已采纳 2021-03-14 17:07:53