简体   繁体   English

Python:列表的pandas列上的字符串匹配

[英]Python: String matching on a pandas column of lists

What is the best way to do string matching on a column of lists? 在列表的一列上进行字符串匹配的最佳方法是什么?
Eg I have a dataset: 例如,我有一个数据集:

import numpy as np
import pandas as pd
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':xrange(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in xrange(3)]})
df

    L                           id
0   [tackle, apple, grapple]    0
1   [tackle, snapple, satchel]  1
2   [satchel, satchel, tackle]  2

And I want to return the rows where any item in L matches a string, eg 'grap' should return row 0, and 'sat' should return rows 1:2. 我想返回L中任何项目匹配字符串的行,例如'grap'应该返回行0,而'sat'应该返回行1:2。

Let's use this: 让我们使用这个:

np.random.seed(123)
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':range(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in range(3)]})
df
                             L  id
0    [tackle, snapple, tackle]   0
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

Use any and apply : 使用anyapply

df[df.L.apply(lambda x: any('grap' in s for s in x))]

Output: 输出:

                             L  id
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

Timings: 时序:

%timeit df.L.apply(lambda x: any('grap' in s for s in x))

10000 loops, best of 3: 194 µs per loop 10000次循环,最佳3:每个循环194 µs

%timeit df.L.apply(lambda i: ','.join(i)).str.contains('grap')

1000 loops, best of 3: 481 µs per loop 1000个循环,最好为3:每个循环481 µs

%timeit df.L.str.join(', ').str.contains('grap')

1000 loops, best of 3: 529 µs per loop 1000个循环,每个循环最好3:529 µs

df[df.L.apply(lambda i: ','.join(i)).str.contains('yourstring')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 列表中的字符串匹配-Python - String matching in lists - Python python pandas-字符串匹配 - python pandas - string matching Pandas 的值计数字符串出现在 Python 中的列表类型的列 - Value Count String Occurrences for Pandas Column of Lists type in Python 从python pandas的dataframe列中搜索匹配的字符串模式 - searching matching string pattern from dataframe column in python pandas 在熊猫python中按列计算匹配部分字符串的出现次数 - count occurrences matching partial string by column in pandas python Python:如何通过将列表中的任何值与包含列表的列匹配来过滤 Pandas Dataframe - Python: How to filter a Pandas Dataframe by matching any value in a list to Column that contains lists Pandas DataFrame:将字符串列转换为列表列 - Pandas DataFrame: Converting Column of String into Column of Lists Python:熊猫列中的部分字符串匹配并从熊猫数据框中的其他列中检索值 - Python: Partial String matching in pandas column and retrieve the values from other columns in pandas dataframe python中两个列表中的精确字符串匹配 - exact string matching in two lists in python Python Pandas:如何在具有其他列的匹配字符串 object 的列中查找元素 - Python Pandas: How to find element in a column with a matching string object of other column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM