Python：列表的pandas列上的字符串匹配

Question

What is the best way to do string matching on a column of lists? 在列表的一列上进行字符串匹配的最佳方法是什么？
Eg I have a dataset: 例如，我有一个数据集：

import numpy as np
import pandas as pd
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':xrange(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in xrange(3)]})
df

    L                           id
0   [tackle, apple, grapple]    0
1   [tackle, snapple, satchel]  1
2   [satchel, satchel, tackle]  2

And I want to return the rows where any item in L matches a string, eg 'grap' should return row 0, and 'sat' should return rows 1:2. 我想返回L中任何项目匹配字符串的行，例如'grap'应该返回行0，而'sat'应该返回行1：2。

Answer 1

Let's use this: 让我们使用这个：

np.random.seed(123)
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':range(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in range(3)]})
df
                             L  id
0    [tackle, snapple, tackle]   0
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

Use any and apply : 使用any并apply ：

df[df.L.apply(lambda x: any('grap' in s for s in x))]

Output: 输出：

                             L  id
1   [grapple, satchel, tackle]   1
2  [satchel, grapple, grapple]   2

Timings: 时序：

%timeit df.L.apply(lambda x: any('grap' in s for s in x))

10000 loops, best of 3: 194 µs per loop 10000次循环，最佳3：每个循环194 µs

%timeit df.L.apply(lambda i: ','.join(i)).str.contains('grap')

1000 loops, best of 3: 481 µs per loop 1000个循环，最好为3：每个循环481 µs

%timeit df.L.str.join(', ').str.contains('grap')

1000 loops, best of 3: 529 µs per loop 1000个循环，每个循环最好3：529 µs

Answer 2

df[df.L.apply(lambda i: ','.join(i)).str.contains('yourstring')]

Python：列表的pandas列上的字符串匹配

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-11-22 19:03:25

Timings: 时序：

解决方案2
2 2017-11-22 19:02:00

Python：列表的pandas列上的字符串匹配

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-11-22 19:03:25

Timings: 时序：

解决方案2 2 2017-11-22 19:02:00

解决方案1
3 已采纳 2017-11-22 19:03:25

解决方案2
2 2017-11-22 19:02:00