简体   繁体   English

如何检查列表中的所有元素是否都存在于 pandas 列中

[英]How to check if all the elements in list are present in pandas column

I have a dataframe and a list:我有一个数据框和一个列表:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8], 
    'char':[['a','b'],['a','b','c'],['a','c'],['b','c'],[],['c','a','d'],['c','d'],['a']]})

names = ['a','c']

I want to get rows only if both a and c both are present in char column.(order doesn't matter here)只有当ac都存在于char列中时,我才想获取行。(这里的顺序无关紧要)

Expected Output:预期输出:

       char  id                                                                                                                      
1  [a, b, c]   2                                                                                                                      
2     [a, c]   3                                                                                                                      
5  [c, a, d]   6   

My Efforts我的努力

true_indices = []
for idx, row in df.iterrows():
    if all(name in row['char'] for name in names):
        true_indices.append(idx)


ids = df[df.index.isin(true_indices)]

Which is giving me correct output but it is too slow for large dataset so I am looking for more efficient solution.这给了我正确的输出,但对于大型数据集来说太慢了,所以我正在寻找更有效的解决方案。

Use pd.DataFrame.apply :使用pd.DataFrame.apply

df[df['char'].apply(lambda x: set(names).issubset(x))]

Output:输出:

   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

You can build a set from the list of names for a faster lookup, and use set.issubset to check if all elements in the set are contained in the column lists:您可以从名称列表构建一个集合以加快查找速度,并使用set.issubset检查集合中的所有元素是否都包含在列列表中:

names = set(['a','c'])
df[df['char'].map(names.issubset)]

   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Use list comprehension with issubset :将列表理解与issubset使用:

mask = [set(names).issubset(x) for x in df['char']]
df = df[mask]
print (df)
   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Another solution with Series.map : Series.map的另一个解决方案:

df = df[df['char'].map(set(names).issubset)]
print (df)
   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

Performance Depends of number of rows and number of matched values:性能取决于行数和匹配值的数量:

df = pd.concat([df] * 10000, ignore_index=True)

In [270]: %timeit df[df['char'].apply(lambda x: set(names).issubset(x))]
45.9 ms ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [271]: %%timeit
     ...: names = set(['a','c'])
     ...: [names.issubset(set(row)) for _,row in df.char.iteritems()]
     ...: 
46.7 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [272]: %%timeit
     ...: df[[set(names).issubset(x) for x in df['char']]]
     ...: 
45.6 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [273]: %%timeit
     ...: df[df['char'].map(set(names).issubset)]
     ...: 
18.3 ms ± 2.96 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [274]: %%timeit
     ...: n = set(names)
     ...: df[df['char'].map(n.issubset)]
     ...: 
16.6 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [279]: %%timeit
     ...: names = set(['a','c'])
     ...: m = [name.issubset(i) for i in df.char.values.tolist()]
     ...: 
19.2 ms ± 317 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Try this.试试这个。

df['char']=df['char'].apply(lambda x: x if ("a"in x and "c" in x) else np.nan)
print(df.dropna())

output:输出:

   id       char
1   2  [a, b, c]
2   3     [a, c]
5   6  [c, a, d]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查一个pandas列中列表中的所有元素是否存在于另一个pandas列中 - How to check if all the elements in list in one pandas column are present in another pandas column 如何检查 pandas 列中的字符串列表的元素是否存在于另一列中 - How to check if elements of a list of strings in a pandas column are present in another column 如何检查 dataframe pandas 中是否不存在列列表 - how to check if a list of column in not present in a dataframe pandas 熊猫-检查列表中的所有元素是否都在列中 - Pandas - check to see if all elements in a list are in a column 检查列表的一个或多个元素是否存在于 Pandas 列中 - Check if one or more elements of a list are present in Pandas column 检查 pandas 列是否包含列表中的所有元素 - Check if pandas column contains all elements from a list 检查列表中的元素是否存在于 Pandas 列中,该列的元素也是列表 - check if an element from list is present in Pandas column whose elements is also list 检查具有列表值的列的元素是否存在于另一个列表中 - Check whether the elements of a column with list values are present in another list 检查该列表是否包含另一个列表中存在的所有类型的元素 - Check that list contains the elements of all the types present in another list Pandas:如何检查数据框列中的任何列表是否存在于另一个数据帧的范围内? - Pandas: How to check if any of a list in a dataframe column is present in a range in another dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM