[英]Pandas compare all values of a column with different DataFrame and return column name (of a dif. DataFrame) where value matches
I'm trying to assign category per row of a dataset based on matching keywords from other dataset.我正在尝试根据来自其他数据集的匹配关键字为数据集的每行分配类别。
With example below, all the values of a new string would be non-sport (column name of df_TWO where the value is found)使用下面的示例,新字符串的所有值都将是非运动的(找到值的 df_TWO 的列名)
df_ONE df_ONE
Heroes The Punisher
Heroes The Punisher
Heroes Human Torch - 1
Heroes Man Thing
Heroes Medusa
Heroes Mr. Fantastic
Movies-TV Star Wars
Movies-TV Star Wars
df_TWO df_TWO
sport non_sport gaming
0 baseball movies-tv pokemon
1 basketball music yugioh
2 football people magic
3 hockey history gaming
4 soccer heroes NaN
5 racing NaN NaN
6 boxing NaN NaN
7 golf NaN NaN
8 mma NaN NaN
9 multisport NaN NaN
10 tennis NaN NaN
11 wrestling NaN NaN
12 poker NaN NaN
would be nice to have this result:有这个结果会很高兴:
Heroes The Punisher non-sport
Heroes The Punisher non-sport
Heroes Human Torch - 1 non-sport
Heroes Man Thing non-sport
Heroes Medusa non-sport
Heroes Mr. Fantastic non-sport
Movies-TV Star Wars non-sport
Movies-TV Star Wars non-sport
I've tried to adopt following solutions but had no luck.我试图采用以下解决方案,但没有运气。
into something like变成类似的东西
You need to reshape your second dataframe.您需要重塑您的第二个 dataframe。 You can do this with melt
pretty easily.你可以很容易地用melt
来做到这一点。
Here is an example of what the melted df looks like:这是融化的 df 的示例:
col_match genre
0 sport baseball
1 sport basketball
2 sport football
3 sport hockey
4 sport soccer
5 sport racing
So you can use the melted df to join the original on the genre.因此,您可以使用融化的 df 加入原始流派。 Be sure to lowercase your genre column in the first df.请务必在第一个 df 中小写您的流派列。
import pandas as pd
import numpy as np
df = pd.DataFrame({
'genre': ['Heroes', 'Heroes', 'Heroes', 'Heroes', 'Heroes', 'Heroes', 'Movies-TV', 'Movies-TV'],
' title': ['The Punisher', 'The Punisher', 'Human Torch - 1', 'Man Thing', 'Medusa', 'Mr. Fantastic', 'Star Wars', 'Star Wars']})
df2 = pd.DataFrame({
'sport': ['baseball', 'basketball', 'football', 'hockey', 'soccer', 'racing', 'boxing', 'golf', 'mma', 'multisport', 'tennis', 'wrestling', 'poker'],
'non_sport': ['movies-tv', 'music', 'people', 'history', 'heroes', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'gaming': ['pokemon', 'yugioh', 'magic', 'gaming', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
df['genre'] = df['genre'].str.lower()
df.merge(df2.melt(value_vars=df2.columns, var_name='col_match', value_name='genre'), on='genre')
Output Output
genre title col_match
0 heroes The Punisher non_sport
1 heroes The Punisher non_sport
2 heroes Human Torch - 1 non_sport
3 heroes Man Thing non_sport
4 heroes Medusa non_sport
5 heroes Mr. Fantastic non_sport
6 movies-tv Star Wars non_sport
7 movies-tv Star Wars non_sport
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.