I'm looking for a way to dynamically add columns from lookup dataframes, suppose i have this example:
import pandas as pd
df = pd.DataFrame({'col1': ["monkey", "monkye", "ape", "banana", "apple", "aple"],
'col2': ["apple", "banana", "", "banana", "", ""],
'col3': ["monkey", "apple", "pear", "", "apple", "aple"]})
monkey = pd.DataFrame({0: ["monkey", "monkye", "etc..", "etc.."]})
apple = pd.DataFrame({0: ["apple", "aple", "etc..", "etc.."]})
banana = pd.DataFrame({0: ["banana", "bananaa", "etc..", "etc.."]})
dataframes = [banana, apple, monkey]
for dataframe in dataframes:
df[['a','b','c']] = df[['col1', 'col2', 'col3']].isin(dataframe[0])
print df
So this will print where df[['a','b','c']]
is replaced all the time:
col1 col2 col3 a b c
0 monkey apple monkey True False True
1 monkye banana apple True False False
2 ape pear False False False
3 banana banana False False False
4 apple apple False False False
5 aple aple False False False
But what I am after is one column for banana, one for apple, and one for monkey so it will look like this:
col1 col2 col3 banana apple monkey
0 monkey apple monkey False True True
1 monkye banana apple True True True
2 ape pear False False False
3 banana banana True False False
4 apple apple False True False
5 aple aple False True False
I believe you need list of tuples for define DataFrames
and their names, then for comparing convert column to list and check at least one True
per row by DataFrame.any
:
dataframes = [('banana', banana), ('apple',apple), ('monkey',monkey)]
for k, v in dataframes:
df[k] = df[['col1', 'col2', 'col3']].isin(v[0].tolist()).any(axis=1)
print (df)
col1 col2 col3 banana apple monkey
0 monkey apple monkey False True True
1 monkye banana apple True True True
2 ape pear False False False
3 banana banana True False False
4 apple apple False True False
5 aple aple False True False
If order is not important use dictionary
:
dataframes = {'banana': banana, 'apple':apple, 'monkey':monkey}
for k, v in dataframes.items():
df[k] = df[['col1', 'col2', 'col3']].isin(v[0].tolist()).any(1)
print (df)
col1 col2 col3 apple banana monkey
0 monkey apple monkey True False True
1 monkye banana apple True True True
2 ape pear False False False
3 banana banana False True False
4 apple apple True False False
5 aple aple True False False
solution 1:
use intersection to see if any of the valid spellings are present in a row. The process is made slighty more convenient if dataframes
is a dict
instead of a list
dataframes = {'monkey': monkey, 'banana': banana, 'apple': apple}
df.assign(
**{k: df.apply(lambda x: np.intersect1d(x.values, v.values).size > 0, axis=1)
for k, v in dataframes.items()}
)
outputs:
col1 col2 col3 apple banana monkey
0 monkey apple monkey True False True
1 monkye banana apple True True True
2 ape pear False False False
3 banana banana False True False
4 apple apple True False False
5 aple aple True False False
you can then assign this back to the original variable (to overwrite the df) or a different variable.
solution 2:
another option would be to use regex for the match.
import re patterns = {'apple': re.compile(r'apple|aple|etc..|etc..'), 'monkey': re.compile(r'monkey|monkye|etc..|etc..'), 'banana': re.compile(r'banana|bananaa|etc..|etc..')}
df.assign(
**{k: df.apply(lambda x: True if re.match(p, ' '.join(x.values)) else False, axis=1)
for k, p in patterns.items()}
)
The output is the same. However, regexes will provide you with a more flexible matching environment.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.