简体   繁体   English

如何根据Python中列的行中列表中的值过滤数据帧?

[英]How to filter a dataframe based on the values present in the list in the rows of a column in Python?

I have a dataframe which looks like: 我有一个数据框,看起来像:

   business_id  stars  categories
0  abcd         4.0    ['Nightlife']
1  abcd1        3.5    ['Pizza', 'Restaurants']
2  abcd2        4.5    ['Groceries', 'Food']

I want to filter the dataframe based on the values present in the categories column. 我想根据类别列中的值过滤数据框。 My dataframe has approximately 400 000 rows and I only want the rows having categories 'Food' or 'Restaurants' in them. 我的数据框有大约400 000行,我只希望其中包含“Food”或“Restaurants”类别的行。

I tried a lot of methods, including: 我尝试了很多方法,包括:

def foodie(x):
    for row in x.itertuples():
        if 'Food' in row[3] or 'Restaurant' in row[3]:
            return x

df = df.apply(foodie, axis=1)

But this is obviously very very bad method since, I am using itertuples on 400 000 rows and my system goes on processing for infinite amount of time. 但这显然是非常非常糟糕的方法,因为我在400 000行上使用itertuples,我的系统继续处理无限的时间。

I also tried using list comprehension in df[df['categories']] . 我也尝试在df[df['categories']]使用列表理解。 But couldn't, since they all are filtering like df[df['stars']==4.0] . 但是不能,因为它们都像df[df['stars']==4.0]那样过滤。 And even all the apply() methods I saw, were being implemented for columns having single value in their columns. 甚至我看到的所有apply()方法都是针对列中具有单个值的列实现的。

So, how can I subset my dataframe using a fairly fast implementation of iterating over my rows and at the same time, select only those rows which have 'Food' or 'Restaurants' in their category? 那么,我如何使用相当快速的迭代实现对我的数据框进行子集化,同时只选择那些在其类别中具有“食物”或“餐馆”的行?

You can use the apply method on the categories column and check if each element contains the Food or Restaurants based on which create a logic index array for subsetting: 您可以在categories列上使用apply方法,并检查每个元素是否包含FoodRestaurants根据这些元素创建用于子集化的逻辑索引数组:

df.loc[df.categories.apply(lambda cat: 'Food' in cat or 'Restaurants' in cat)]

#     business_id             categories      stars
# 1         abcd1   [Pizza, Restaurants]        3.5
# 2         abcd2      [Groceries, Food]        4.5

Just another idea. 只是另一个想法。 Keep strings instead of list objects. 保留字符串而不是列表对象。

In [2]: import pandas as pd

In [3]: data = {'business_id':['abcd','abcd1','abcd2'],'stars':    [4.0,3.5,4.5],'categories':[['Nightlife'],['Pizza', 'Restaurants'],['Groceries', 'Food']]}
# convert list to string with join() method
In [15]: df.categories = df.categories.apply(",".join)

In [16]: df 
Out[16]: 
  business_id         categories  stars
0        abcd          Nightlife    4.0
1       abcd1  Pizza,Restaurants    3.5
2       abcd2     Groceries,Food    4.5

In [26]: df.categories.str.contains('Food')
Out[26]: 
0    False
1    False
2     True
Name: categories, dtype: bool

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:根据多个动态列值过滤数据框行 - Python : Filter dataframe rows based on multiple dynamic column values 如何基于以列表为值的列在Pandas数据框中过滤行? - How do I filter for rows in a Pandas dataframe based on a column that has list as values? 如何根据dataframe python中列中的列表值进行分组 - How to group by according to the values of a list present in a column in dataframe python 如何根据 Python 列表中的列号过滤数据框中的行? - How do you filter rows in a dataframe based on the column numbers from a Python list? 如果行包含列中列表中的两个值,如何过滤 dataframe - how to filter a dataframe if a rows contains two values from a list in a column pandas dataframe 按作为列表的列的值过滤行 - pandas dataframe filter rows by values of column that is a list 根据列列表值筛选 pandas dataframe - Filter pandas dataframe based on column list values 如何根据 dataframe 列中存储的列表值创建多行? - How to create multiple rows based on list values stores in column in dataframe? How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe - How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe 根据列中的值和数字列表过滤行 - Filter rows based on list of values and number in a column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM