[英]How to Filter a Pandas Dataframe Column of Lists
Goal: To filter rows based on the values of column of lists.目标:根据列表列的值过滤行。
Given:鉴于:
index指数 | pos_order pos_order |
---|---|
3192304 3192304 | ['VB', 'DT', 'NN', 'NN', 'NN', 'NN'] |
1579035 1579035 | ['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN'] |
763020 763020 | ['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN'] |
1289986 1289986 | ['VB', 'NN', 'IN', 'CD', 'CD'] |
69194 69194 | ['VB', 'DT', 'JJ', 'NN'] |
3068116 3068116 | ['VB', 'JJ', 'IN', 'NN', 'NN'] |
1506722 1506722 | ['VB', 'NN', 'NNS', 'NNP'] |
3438101 3438101 | ['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN'] |
1376463 1376463 | ['VB', 'DT', 'NN', 'NN'] |
1903231 1903231 | ['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP'] |
I'd like to find a way to query this table to fetch rows where a given pattern is present.我想找到一种方法来查询此表以获取存在给定模式的行。 For example, if the pattern is ['IN', 'NN']
, I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters .例如,如果模式是['IN', 'NN']
,我应该得到第 763020 和 3068116 行,而不是第 3438101 行。所以要清楚,列表元素的顺序也很重要。
I tried going about it, this way:我试着这样做,这样:
def target_phrase(pattern_tested, pattern_to_match):
if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
print (pattern_tested)
return True
else:
return False
I can run this code using lists outside of pandas, but when I try using something like:我可以使用 pandas 之外的列表运行此代码,但是当我尝试使用类似的东西时:
target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])
the code fails.代码失败。
Any clue?有什么线索吗?
First, let me provide a simplified view of target_phrase
:首先,让我提供一个target_phrase
的简化视图:
def target_phrase(pattern_tested, pattern_to_match):
return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))
Why the code does not work?为什么代码不起作用? Because target_phrase
expects the first argument to be a list, not a pandas dataframe.因为target_phrase
期望第一个参数是一个列表,而不是 pandas dataframe。 The correct syntaxis is as follows:正确的语法如下:
df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'],
['IN', 'NN']), axis=1)
This function applies target_phrase
row-wise.此 function 按行应用target_phrase
。
As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.事实证明,这是一系列事情的结合,Kate 和 Serge 一起让我想明白了。
As I had everything, the data types being compared were not similar.因为我拥有一切,被比较的数据类型并不相似。 I was comparing a string to a list.我正在将字符串与列表进行比较。 I had to add code to convert that string that looked like a list to a list--Serge's contribution.我必须添加代码来将看起来像列表的字符串转换为列表——Serge 的贡献。 Once that was done, I was able to successfully run lambda thanks to Kate.完成后,感谢 Kate,我能够成功运行 lambda。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.