简体   繁体   English

python 中的错误:索引 0 超出尺寸为 0 的轴 0 的范围

[英]error in python : index 0 is out of bounds for axis 0 with size 0

Today i want to learn about how to code a content based filtering in python, and so i search some code and i apply it.今天我想了解如何在 python 中编写基于内容的过滤代码,所以我搜索了一些代码并应用了它。 I have a simple dataset contains a hotel dataset, with the name, address, and description.我有一个简单的数据集,其中包含一个酒店数据集,包含名称、地址和描述。 After i tried the code, its said index 0 is out of bounds for axis 0 with size 0 at the end of the code.在我尝试了代码之后,它所说的索引 0 超出了代码末尾大小为 0 的轴 0 的范围。 Here's the code:这是代码:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
import re
import random

data = pd.read_csv('hotel.csv')
data.head()

the output: nama alamat deskripsi 0 Capital O 253 Topas Galeria Hotel Jl. output: nama alamat deskripsi 0 Capital O 253 Topas Galeria Hotel Jl. Dr. Djundjunan No. 153, 40173 Bandung, Ind... Berjarak 10 menit berkendara dari Bandara Inte... 1 Sheraton Bandung Hotel & Towers Jl. Dr. Djundjunan No. 153, 40173 Bandung, Ind... Berjarak 10 menit berkendara dari Bandara Inte... 1 Sheraton Bandung Hotel & Towers Jl. Ir H Juanda 390, 40135 Bandung, Indonesia Sheraton Hotel & Towers menawarkan akomodasi b... 2 OYO 794 Ln 9 Bandung Residence Jalan Lemahnendeut No 9, Sukajadi, 40164 Bandu... Berlokasi nyaman di Sukajadi, Bandung, OYO 794... 3 OYO 226 LJ hotel Jl. Ir H Juanda 390, 40135 Bandung, Indonesia Sheraton Hotel & Towers menawarkan akomodasi b... 2 OYO 794 Ln 9 Bandung Residence Jalan Lemahnendeut No 9, Sukajadi, 40164 Bandu... Berlokasi nyaman di Sukajadi, Bandung, OYO 794... 3 OYO 226 LJ 酒店 Jl。 Malabar No.2, Malabar, Lengkong, Dago, Asi... OYO 226 LJ hotel di Bandung, Jawa Barat, tepat... 4 OYO 230 Maleo Residence JI. Malabar No.2, Malabar, Lengkong, Dago, Asi... OYO 226 LJ hotel di Bandung, Jawa Barat, tepat... 4 OYO 230 Maleo Residence JI。 Dangeur Indah II No. 15, Sukagalih, Sukaja... OYO 230 Maleo Residence menawarkan akomodasi b... Dangeur Indah II No. 15, Sukagalih, Sukaja... OYO 230 Maleo Residence menawarkan akomodasi b...

data.describe()
data.info()

the output: output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   nama       90 non-null     object
 1   alamat     90 non-null     object
 2   deskripsi  90 non-null     object
dtypes: object(3)
memory usage: 2.2+ KB

clean_spcl = re.compile('[/(){}\[\]\|@,;]')
clean_symbol = re.compile('[^0-9a-z #+_]')
stopworda = set(stopwords.words('english'))

def clean_text(text):
    text = text.lower() 
    text = clean_spcl.sub(' ', text)
    text = clean_symbol.sub('', text)
    text = ' '.join(word for word in text.split() if word not in stopworda) # hapus stopword dari kolom deskripsi
    return text
  
data['deskripsi_new'] = data['deskripsi'].apply(clean_text)

def pt_desc(index):
    example = data[data.index == index][['deskripsi_new', 'nama', 'alamat']].values[0]
    if len(example) > 0:
        print(example[0])
        print('Nama:', example[1])
        print('Alamat:', example[2])   

data.set_index('nama', inplace=True)
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(data['deskripsi_new'])
cos_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cos_sim

the output: output:

array([[1.        , 0.07106689, 0.03075961, ..., 0.07474134, 0.0732575 ,
        0.01680878],
       [0.07106689, 1.        , 0.03508807, ..., 0.05947269, 0.08705608,
        0.01986701],
       [0.03075961, 0.03508807, 1.        , ..., 0.09113962, 0.05879732,
        0.06808138],
       ...,
       [0.07474134, 0.05947269, 0.09113962, ..., 1.        , 0.06321301,
        0.02205802],
       [0.0732575 , 0.08705608, 0.05879732, ..., 0.06321301, 1.        ,
        0.02245328],
       [0.01680878, 0.01986701, 0.06808138, ..., 0.02205802, 0.02245328,
        1.        ]])

indices = pd.Series(data.index)
indices[:50]

def rekomendasi(nama, cos_sim = cos_sim):
    
    rec = []
    
    idx = indices[indices == nama].index[0]

    score_series = pd.Series(cos_sim[idx]).sort_values(ascending = False)

    top_10_indexes = list(score_series.iloc[1:11].index)
    
    for i in top_10_indexes:
        recommended_news.append(list(data.index)[i])
        
    return rec

rekomendasi('Hotel') # and when i reach here, the error said 'index 0 is out of bounds for axis 0 with size 0'

what went wrong here?这里出了什么问题?

From what I understand you are trying to build a kind of search engine, which given a search vector will return the 10 best matching results.据我了解,您正在尝试构建一种搜索引擎,在给定搜索向量的情况下,该搜索引擎将返回 10 个最佳匹配结果。

If this is the case, you'll need to modify your rekomendasi function so that it will:如果是这种情况,您需要修改rekomendasi function 以便它:

  • process the input query vector处理输入查询向量
  • compute the similarity scores with the corpus (corpus here mean the list of your hotel descriptions)计算与语料库的相似度分数(这里的语料库是指您的酒店描述列表)
  • return the 10 items with highest similarity scores返回具有最高相似度分数的 10 个项目

I've modified your code to do that:我已经修改了你的代码来做到这一点:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
import re
import random

data = pd.read_csv('../../../../Downloads/hotel.csv')

clean_spcl = re.compile('[/(){}\[\]\|@,;]')
clean_symbol = re.compile('[^0-9a-z #+_]')
stopworda = set(stopwords.words('english'))

def clean_text(text):
    text = text.lower() 
    text = clean_spcl.sub(' ', text)
    text = clean_symbol.sub('', text)
    text = ' '.join(word for word in text.split() if word not in stopworda) # hapus stopword dari kolom deskripsi
    return text
  
data['deskripsi_new'] = data['deskripsi'].apply(clean_text)

def pt_desc(index):
    example = data[data.index == index][['deskripsi_new', 'nama', 'alamat']].values[0]
    if len(example) > 0:
        print(example[0])
        print('Nama:', example[1])
        print('Alamat:', example[2])   

data.set_index('nama', inplace=True)
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(data['deskripsi_new'])
cos_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cos_sim

def rekomendasi(nama, cos_sim=cos_sim):
    
    # you first need to preprocess the given query text (i.e. nama) and transform it in tf=idf vector
    nama = clean_text(nama)
    nama_vector = tf.transform([nama])
    
    # Next we compute similarity scores between the query text (nama) and the corpus (tfidf_matrix)
    similarity_scores = cosine_similarity(nama_vector, tfidf_matrix).squeeze()
    top_10_indices = similarity_scores.argsort()[-10:][::-1]
    
    rec = data.index[top_10_indices].tolist()
    return rec

Example:例子:

rekomendasi('Hotel')

['The Trans Luxury Hotel Bandung', 'M Premiere Hotel Dago Bandung', 'Mutiara Hotel', 'éL Hotel Royale Bandung', 'The Jayakarta Suites Bandung, Hotel & Spa', 'Hotel Cemerlang', "OYO 167 Dago's Hill Hotel", "OYO 167 Dago's Hill Hotel", 'HARRIS Hotel & Conventions Ciumbuleuit – Bandung', 'Padma Hotel Bandung']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 错误:索引 8 超出尺寸为 8 的轴 0 的范围 - Python error: index 8 is out of bounds for axis 0 with size 8 索引错误:索引3超出了尺寸为3的轴1的范围 - Index Error: index 3 is out of bounds for axis 1 with size 3 索引错误:索引 2 超出轴 0 的范围,大小为 2 - Index Error : Index 2 is out of bounds for axis 0 with size 2 Python - 切片错误:IndexError:索引 3 超出了轴 2 大小为 3 的范围 - Python - Slicing error: IndexError: index 3 is out of bounds for axis 2 with size 3 Python 3错误:“ IndexError:索引140超出了轴1的大小100的范围” - Python 3 error: “IndexError: index 140 is out of bounds for axis 1 with size 100” python append error index 1超出了大小为1的轴0的范围 - python append error index 1 is out of bounds for axis 0 with size 1 Python 中的“索引 0 超出轴 0 尺寸 0”错误的保护 - Protection against “index 0 is out of bounds for axis 0 with size 0” error in Python python“IndexError:索引8超出了轴0大小为8的范围” - python "IndexError: index 8 is out of bounds for axis 0 with size 8" python IndexError:索引3超出轴3的大小3 - python IndexError: index 3 is out of bounds for axis 0 with size 3 “python中的索引1超出了0号轴的范围” - “index 1 is out of bounds for axis 0 with size 1” in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM