简体   繁体   English

KeyError:熊猫数据框中的错误

[英]KeyError: False in pandas dataframe

import pandas as pd

businesses = pd.read_json(businesses_filepath, lines=True, encoding='utf_8')
restaurantes = businesses['Restaurants' in businesses['categories']]

I would like to remove the lines that do not have Restaurants in the categories column, and this column has lists, however gave the error 'KeyError: False' and I would like to understand why and how to solve.我想删除类别列中没有餐厅的行,该列有列表,但是给出了错误“KeyError: False”,我想了解原因以及如何解决。

The expression 'Restaurants' in businesses['categories'] returns the boolean value False . 'Restaurants' in businesses['categories']的表达式'Restaurants' in businesses['categories']返回布尔值False This is passed to the brackets indexing operator for the DataFrame businesses which does not contain a column called False and thus raises a KeyError.这被传递给不包含名为 False 的列的 DataFrame 业务的括号索引运算符,因此引发 KeyError。

What you are looking to do is something called boolean indexing which works like this.您要做的是称为布尔索引的东西,它的工作原理是这样的。

businesses[businesses['categories'] == 'Restaurants']

If you find that your data contains spelling variations or alternative restaurant related terms, the following may be of benefit.如果您发现您的数据包含拼写变体或替代餐厅相关术语,以下内容可能会有所帮助。 Essentially you put your restaurant related terms in restuarant_lst .本质上,您将与餐厅相关的术语放在restuarant_lst The lambda function returns true if any of the items in restaurant_lst are contained within each row of the business series.如果restaurant_lst中的任何项目包含在业务系列的每一行中,则lambda函数将返回true The .loc indexer filters out rows which return false for the lambda function. .loc索引器过滤掉为lambda函数返回false行。

restaurant_lst = ['Restaurant','restaurantes','diner','bistro']
restaurant = businesses.loc[businesses.apply(lambda x: any(restaurant_str in x for restaurant_str in restaurant_lst))]

The reason for this is that the Series class implements a custom in operator that doesn't return an iterable like the == does, here's a workaround这样做的原因是Series类实现了一个自定义in运算符,它不像==那样返回iterable ,这是一个解决方法

businesses[['Restaurants' in c for c in list(businesses['categories'])]]

hopefully this helps someone where you're looking for a substring in the column and not a full match.希望这可以帮助您在列中查找子字符串而不是完全匹配的人。

I think what you meant was :我想你的意思是:

businesses = businesses.loc[businesses['categories'] == 'Restaurants']

that will only keep rows with the category restaurants只会保留类别餐厅的行

None of the answers here actually worked for me,这里没有一个答案对我有用,

businesses[businesses['categories'] == 'Restaurants']

obviously won't work since the value in 'categories' is not a string, it's a list, meaning the comparison will always fail.显然不会起作用,因为“类别”中的值不是字符串,而是列表,这意味着比较总是会失败。

What does , however, work, is converting the column into tuples instead of strings:这是什么,但是,工作中,被列转换成元组,而不是字符串:

businesses['categories'] = businesses['categories'].apply(tuple)

That allows you to use the standard .loc thing:这允许您使用标准的 .loc 东西:

businesses.loc[businesses['categories'] == ('Restaurants',)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM