简体   繁体   中英

How to Filter a Pandas Dataframe Column of Lists

Goal: To filter rows based on the values of column of lists.

Given:

index pos_order
3192304 ['VB', 'DT', 'NN', 'NN', 'NN', 'NN']
1579035 ['VB', 'PRP', 'VBP', 'NN', 'RB', 'IN', 'NNS', 'NN']
763020 ['VB', 'VBP', 'PRP', 'JJ', 'IN', 'NN']
1289986 ['VB', 'NN', 'IN', 'CD', 'CD']
69194 ['VB', 'DT', 'JJ', 'NN']
3068116 ['VB', 'JJ', 'IN', 'NN', 'NN']
1506722 ['VB', 'NN', 'NNS', 'NNP']
3438101 ['VB', 'VB', 'IN', 'DT', 'NNS', 'NNS', 'CC', 'NN', 'NN']
1376463 ['VB', 'DT', 'NN', 'NN']
1903231 ['VB', 'DT', 'PRP', 'VBP', 'JJ', 'IN', 'NNP', 'NNP']

I'd like to find a way to query this table to fetch rows where a given pattern is present. For example, if the pattern is ['IN', 'NN'] , I should get rows 763020 and 3068116, but not row 3438101. So to be clear, the order of the list elements also matters .

I tried going about it, this way:

def target_phrase(pattern_tested, pattern_to_match):
    if ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested)):
        print (pattern_tested)
        return True
    else:
        return False

I can run this code using lists outside of pandas, but when I try using something like:

target_phrase(df.loc[5]['pos_order'], ['IN', 'NN'])

the code fails.

Any clue?

First, let me provide a simplified view of target_phrase :

def target_phrase(pattern_tested, pattern_to_match):
    return ''.join(map(str, pattern_to_match)) in ''.join(map(str, pattern_tested))

Why the code does not work? Because target_phrase expects the first argument to be a list, not a pandas dataframe. The correct syntaxis is as follows:

df['pattern_matched'] = df.apply(lambda x: target_phrase(x['pos_order'], 
                                                         ['IN', 'NN']), axis=1)

This function applies target_phrase row-wise.

As it turned out it was a combination of things, things that Kate and Serge together led me to figure out.

As I had everything, the data types being compared were not similar. I was comparing a string to a list. I had to add code to convert that string that looked like a list to a list--Serge's contribution. Once that was done, I was able to successfully run lambda thanks to Kate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM