I have a dataframe1 with a column that has cell entries like so (each cell is a set of comma separated words) - "are, boy, cat, dog, ear, far, gone"
.
Dataframe2 has a column with cell entries like so (each cell is a single letter or word) - "are"
or "boy"
or "gone"
.
I want to add a column to dataframe1 that will have a boolean entry if a word in each cell in dataframe1 has the word in dataframe2. For example,
DF1 = (are, boy, cat, dog, ear, far, gone), (home, guy, tall, egg), (cat, done, roof, grass), etc.....
DF2 = (are), (boy), (gone), etc....
New column cell value in dataframe1 = (1), (0), (1), etc....
Assuming these are your inputs,
import pandas as pd
df1 = pd.DataFrame(
{'A': [
'are, boy, cat, dog, ear, far, gone',
'home, guy, tall, egg',
'cat, done, roof, grass',
]}
)
df2 = pd.DataFrame({'A': ['are', 'boy', 'gone']})
print('%s\n%s' % (df1, df2))
# A
# 0 are, boy, cat, dog, ear, far, gone
# 1 home, guy, tall, egg
# 2 cat, done, roof, grass, boy
# A
# 0 are
# 1 boy
# 2 gone
We can use Series.str.split()
to convert the comma-separated strings to lists:
df1['A'].str.split(r'\s*,\s*')
# 0 [are, boy, cat, dog, ear, far, gone]
# 1 [home, guy, tall, egg]
# 2 [cat, done, roof, grass, boy]
# Name: A, dtype: object
The argument to split r'\\s*,\\s*'
is a RegEx pattern matching commas and any whitespace between. The r
prefix means to treat the string literal as a raw string .
Then we apply set.isdisjoint()
to check which cells do not contain the values in df2
:
df1['A'].str.split(r'\s*,\s*').apply(set(df2['A']).isdisjoint)
# 0 False
# 1 True
# 2 True
# Name: A, dtype: bool
And then finally negate this and assign to a new column 'B'
:
df1['B'] = ~df1['A'].str.split(r'\s*,\s*').apply(set(df2['A']).isdisjoint)
print(df1)
# A B
# 0 are, boy, cat, dog, ear, far, gone True
# 1 home, guy, tall, egg False
# 2 cat, done, roof, grass False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.