简体   繁体   中英

pandas lookup dataframes with isin and append

I'm looking for a way to dynamically add columns from lookup dataframes, suppose i have this example:

    import pandas as pd


df = pd.DataFrame({'col1': ["monkey", "monkye", "ape", "banana", "apple", "aple"], 
                   'col2': ["apple", "banana", "", "banana", "", ""], 
                   'col3': ["monkey", "apple", "pear", "", "apple", "aple"]})

monkey = pd.DataFrame({0: ["monkey", "monkye", "etc..", "etc.."]})
apple = pd.DataFrame({0: ["apple", "aple", "etc..", "etc.."]})
banana = pd.DataFrame({0: ["banana", "bananaa", "etc..", "etc.."]})

dataframes = [banana, apple, monkey]

for dataframe in dataframes:
    df[['a','b','c']] = df[['col1', 'col2', 'col3']].isin(dataframe[0])

print df

So this will print where df[['a','b','c']] is replaced all the time:

     col1    col2    col3      a      b      c
0  monkey   apple  monkey   True  False   True
1  monkye  banana   apple   True  False  False
2     ape            pear  False  False  False
3  banana  banana          False  False  False
4   apple           apple  False  False  False
5    aple            aple  False  False  False

But what I am after is one column for banana, one for apple, and one for monkey so it will look like this:

     col1    col2    col3 banana  apple monkey 
0  monkey   apple  monkey  False   True   True
1  monkye  banana   apple   True   True   True
2     ape            pear  False  False  False
3  banana  banana           True  False  False
4   apple           apple  False   True  False
5    aple            aple  False   True  False

I believe you need list of tuples for define DataFrames and their names, then for comparing convert column to list and check at least one True per row by DataFrame.any :

dataframes = [('banana', banana), ('apple',apple), ('monkey',monkey)]

for k, v in dataframes:
    df[k] = df[['col1', 'col2', 'col3']].isin(v[0].tolist()).any(axis=1)
print (df)

     col1    col2    col3  banana  apple  monkey
0  monkey   apple  monkey   False   True    True
1  monkye  banana   apple    True   True    True
2     ape            pear   False  False   False
3  banana  banana            True  False   False
4   apple           apple   False   True   False
5    aple            aple   False   True   False

If order is not important use dictionary :

dataframes = {'banana': banana, 'apple':apple, 'monkey':monkey}

for k, v in dataframes.items():
    df[k] = df[['col1', 'col2', 'col3']].isin(v[0].tolist()).any(1)
print (df)

     col1    col2    col3  apple  banana  monkey
0  monkey   apple  monkey   True   False    True
1  monkye  banana   apple   True    True    True
2     ape            pear  False   False   False
3  banana  banana          False    True   False
4   apple           apple   True   False   False
5    aple            aple   True   False   False

solution 1:

use intersection to see if any of the valid spellings are present in a row. The process is made slighty more convenient if dataframes is a dict instead of a list

dataframes = {'monkey': monkey, 'banana': banana, 'apple': apple}
df.assign(
  **{k: df.apply(lambda x: np.intersect1d(x.values, v.values).size > 0, axis=1)
    for k, v in dataframes.items()}
)

outputs:

     col1    col2    col3  apple  banana  monkey
0  monkey   apple  monkey   True   False    True
1  monkye  banana   apple   True    True    True
2     ape            pear  False   False   False
3  banana  banana          False    True   False
4   apple           apple   True   False   False
5    aple            aple   True   False   False

you can then assign this back to the original variable (to overwrite the df) or a different variable.

solution 2:

another option would be to use regex for the match.

import re patterns = {'apple': re.compile(r'apple|aple|etc..|etc..'), 'monkey': re.compile(r'monkey|monkye|etc..|etc..'), 'banana': re.compile(r'banana|bananaa|etc..|etc..')}

df.assign(
  **{k: df.apply(lambda x: True if re.match(p, ' '.join(x.values)) else False, axis=1)
     for k, p in patterns.items()}
)

The output is the same. However, regexes will provide you with a more flexible matching environment.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM