[英]Matching an element from a list to a column that holds lists. If single element found, return entire row
If there is a column that holds lists and if a single element matches from our list, Return entire row.如果有一列包含列表,并且如果单个元素与我们的列表匹配,则返回整行。 For example we have a data frame:
例如我们有一个数据框:
index x
0 [apple, orange, strawberry]
1 [blueberry, pear, watermelon]
2 [apple, banana, strawberry]
3 [apple]
4 [strawberry]
And we have our list,
a = [apple, strawberry]
# I am trying to return index 0,2,3 and 4. But currently I am only able to return index 3 and 4
new_DF = df[df['x'].isin(a)]
# This function is getting the user input for list 'a'.
# This is for reference of what I am actually trying to do.
def filter_Industries():
num_of_industries = int(input('How many industries would you like to filter by?\n'))
list_industries = []
i = 0
for i in range(num_of_industries):
industry = input("Enter the industry:\n")
i += 1
list_industries.append(industry)
return list_industries
a = filter_Industries()
# This is where I am trying to match the elements from the user's list to the data set.
new_DF = df[df['x'].isin(a)]
You can use DataFrame.apply(function)
method.您可以使用
DataFrame.apply(function)
方法。 In this case we check all rows whether have a common with "a" list.Let's create function:在这种情况下,我们检查所有行是否与“a”列表有共同点。让我们创建 function:
a = ["apple", "strawberry"]
a_set = set(a)
def hasCommon(x):
return len(set(x) & a_set) > 0
So if we have a common element it will return True.因此,如果我们有一个公共元素,它将返回 True。 Let's create dummy data
让我们创建虚拟数据
import pandas as pd
data = {
"calories": [["apple", "orange", "strawberry"], ["blueberry", "pear", "watermelon"], ["strawberry", "pear", "watermelon"]],
"duration": [50, 40,120]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
And we can use like that:我们可以这样使用:
df[df["calories"].apply(hasCommon)]
When you using isin(a) on the values of the 0, 1 and 2 index, the function try to compare a list (eg, [apple, orange, strawberry]) with the a list.当您对 0、1 和 2 索引的值使用 isin(a) 时,function 会尝试将列表(例如,[apple, orange, strawberry])与a列表进行比较。 The function worked with the 3 and 4 elements because it compares a single element with a whole list.
function 使用 3 和 4 元素,因为它将单个元素与整个列表进行比较。
I suggest to intersect the a list and the dataframe after converted that two a set, with this code:我建议将a列表和 dataframe 转换为两个集合后相交,使用以下代码:
for i in range(len(df)):
if set(a) & set(df['x'][i]) != set():
new_DF.append(df['x'][i])
It will append to new_DF just the lines that isn't returned void sets.它将 append 发送给 new_DF 只是未返回的行无效集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.