Goal: To filter rows based on the values of column of lists.
Given:
index | pos_order |
---|---|
3192304 | ['VB', 'DT', 'NN', 'NN', 'NN', 'NN'] |
1579035 | ['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN'] |
763020 | ['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN'] |
1289986 | ['VB', 'NN', 'IN', 'CD', 'CD'] |
69194 | ['VB', 'DT', 'JJ', 'NN'] |
3068116 | ['VB', 'JJ', 'IN', 'NN', 'NN'] |
1506722 | ['VB', 'NN', 'NNS', 'NNP'] |
3438101 | ['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN'] |
1376463 | ['VB', 'DT', 'NN', 'NN'] |
1903231 | ['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP'] |
I'd like to find a way to query this table to fetch rows where a given pattern is present. For example, if the pattern is ['IN', 'NN']
, I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters .
I tried going about it, this way:
def target_phrase(pattern_tested, pattern_to_match):
if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
print (pattern_tested)
return True
else:
return False
I can run this code using lists outside of pandas, but when I try using something like:
target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])
the code fails.
Any clue?
First, let me provide a simplified view of target_phrase
:
def target_phrase(pattern_tested, pattern_to_match):
return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))
Why the code does not work? Because target_phrase
expects the first argument to be a list, not a pandas dataframe. The correct syntaxis is as follows:
df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'],
['IN', 'NN']), axis=1)
This function applies target_phrase
row-wise.
As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.
As I had everything, the data types being compared were not similar. I was comparing a string to a list. I had to add code to convert that string that looked like a list to a list--Serge's contribution. Once that was done, I was able to successfully run lambda thanks to Kate.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.