[英]Python Index Error - Out of Bounds for axis 0
I have dataset like the following in the txt file.我在 txt 文件中有如下数据集。 (First column is=userid, second column is=locationid) Normally my dataset is big but I created a dummy dataset to better explain my problem.
(第一列=userid,第二列=locationid)通常我的数据集很大,但我创建了一个虚拟数据集来更好地解释我的问题。
I'm trying to create a matrix like in the code below.我正在尝试创建一个矩阵,如下面的代码所示。 row will be userid column location id.
行将是用户 ID 列位置 ID。 Since this dataset shows the location ids visited by the users, I assign the value 1 in the code to the locations they visited in the matrix.
由于该数据集显示了用户访问的位置 ID,因此我将代码中的值 1 分配给他们在矩阵中访问的位置。
I am getting an indexerror.我收到一个索引错误。
IndexError: index 801 is out of bounds for axis 0 with size 50
I tried different user_num and poi_num but still doesn't work我尝试了不同的 user_num 和 poi_num 但仍然不起作用
datausers.txt数据用户.txt
801 32332
801 14470
801 33847
501 10259
501 34041
501 10201
301 15810
301 34827
301 19264
401 34834
401 35407
401 36115
Code代码
import numpy as np
from collections import defaultdict
from itertools import islice
import pandas as pd
train_file = "datausers.txt"
user_num = 20
poi_num = 20
training_matrix = np.zeros((user_num, poi_num))
train_data = list(islice(open(train_file, 'r'), 10))
for eachline in train_data:
uid, lid= eachline.strip().split()
uid, lid = int(uid), int(lid)
training_matrix[uid, lid] = 1.0
Error错误
Expected Output预计 Output
4x12 Matrix because we have 4 unique users and 12 unique location 4x12 矩阵,因为我们有 4 个唯一用户和 12 个唯一位置
[1 0 1 0 1 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 0 0 0 0
...
]
For example for first row 1 0 1 0 1 0 0 0 0 0 0 0例如对于第一行 1 0 1 0 1 0 0 0 0 0 0 0
User 801 visited 3 locations and those are 1. (The location of the 1's can be variable I gave it to be an example)用户 801 访问了 3 个位置,它们是 1。(1 的位置可以是可变的,我以它为例)
As you have tagged the question with pandas
, here is one way of approaching the problem with str.get_dummies
method of the pandas Series
:正如您使用
pandas
标记问题一样,这是使用 pandas Series
的str.get_dummies
方法解决问题的一种方法:
df = pd.read_csv('datausers.txt', sep='\s+', names=['userid', 'locationid'], index_col=0)
out = df['locationid'].astype(str).str.get_dummies().sum(level=0)
Result结果
For the sample data对于样本数据
>>> out
10201 10259 14470 15810 19264 32332 33847 34041 34827 34834 35407 36115
userid
801 0 0 1 0 0 1 1 0 0 0 0 0
501 1 1 0 0 0 0 0 1 0 0 0 0
301 0 0 0 1 1 0 0 0 1 0 0 0
401 0 0 0 0 0 0 0 0 0 1 1 1
If you need numpy
array instead:如果您需要
numpy
阵列代替:
>>> out.to_numpy()
array([[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.