如何过滤列表的 Pandas Dataframe 列

Question

Goal: To filter rows based on the values of column of lists.目标：根据列表列的值过滤行。

Given:鉴于：

index指数	pos_order pos_order
3192304 3192304	`['VB', 'DT', 'NN', 'NN', 'NN', 'NN']`
1579035 1579035	`['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN']`
763020 763020	`['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN']`
1289986 1289986	`['VB', 'NN', 'IN', 'CD', 'CD']`
69194 69194	`['VB', 'DT', 'JJ', 'NN']`
3068116 3068116	`['VB', 'JJ', 'IN', 'NN', 'NN']`
1506722 1506722	`['VB', 'NN', 'NNS', 'NNP']`
3438101 3438101	`['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN']`
1376463 1376463	`['VB', 'DT', 'NN', 'NN']`
1903231 1903231	`['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP']`

I'd like to find a way to query this table to fetch rows where a given pattern is present.我想找到一种方法来查询此表以获取存在给定模式的行。 For example, if the pattern is ['IN', 'NN'] , I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters .例如，如果模式是['IN', 'NN'] ，我应该得到第 763020 和 3068116 行，而不是第 3438101 行。所以要清楚，列表元素的顺序也很重要。

I tried going about it, this way:我试着这样做，这样：

def target_phrase(pattern_tested, pattern_to_match):
    if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
        print (pattern_tested)
        return True
    else:
        return False

I can run this code using lists outside of pandas, but when I try using something like:我可以使用 pandas 之外的列表运行此代码，但是当我尝试使用类似的东西时：

target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])

the code fails.代码失败。

Any clue?有什么线索吗？

Answer 1

First, let me provide a simplified view of target_phrase :首先，让我提供一个target_phrase的简化视图：

def target_phrase(pattern_tested, pattern_to_match):
    return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))

Why the code does not work?为什么代码不起作用？ Because target_phrase expects the first argument to be a list, not a pandas dataframe.因为target_phrase期望第一个参数是一个列表，而不是 pandas dataframe。 The correct syntaxis is as follows:正确的语法如下：

df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'], 
                                                         ['IN', 'NN']), axis=1)

This function applies target_phrase row-wise.此 function 按行应用target_phrase 。

Answer 2

As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.事实证明，这是一系列事情的结合，Kate 和 Serge 一起让我想明白了。

As I had everything, the data types being compared were not similar.因为我拥有一切，被比较的数据类型并不相似。 I was comparing a string to a list.我正在将字符串与列表进行比较。 I had to add code to convert that string that looked like a list to a list--Serge's contribution.我必须添加代码来将看起来像列表的字符串转换为列表——Serge 的贡献。 Once that was done, I was able to successfully run lambda thanks to Kate.完成后，感谢 Kate，我能够成功运行 lambda。

如何过滤列表的 Pandas Dataframe 列

问题描述

2 个解决方案

解决方案1
2 2021-02-03 18:36:00

解决方案2
0 2021-02-03 22:11:51

如何过滤列表的 Pandas Dataframe 列

问题描述

2 个解决方案

解决方案1 2 2021-02-03 18:36:00

解决方案2 0 2021-02-03 22:11:51

解决方案1
2 2021-02-03 18:36:00

解决方案2
0 2021-02-03 22:11:51