Pandas 将列的所有值与不同的 DataFrame 进行比较，并返回值匹配的列名（不同的 DataFrame）

Question

I'm trying to assign category per row of a dataset based on matching keywords from other dataset.我正在尝试根据来自其他数据集的匹配关键字为数据集的每行分配类别。

I compare df_ONE['columnname'] to every value of df_TWO我将 df_ONE['columnname'] 与 df_TWO 的每个值进行比较
and if matching value found use column name of df_TWO where this value is located as a cell value of new column in df_ONE.如果找到匹配值，则使用 df_TWO 的列名，其中该值位于 df_ONE 中作为新列的单元格值。

With example below, all the values of a new string would be non-sport (column name of df_TWO where the value is found)使用下面的示例，新字符串的所有值都将是非运动的（找到值的 df_TWO 的列名）

df_ONE df_ONE

Heroes      The Punisher        
Heroes      The Punisher        
Heroes      Human Torch - 1     
Heroes      Man Thing           
Heroes      Medusa              
Heroes      Mr. Fantastic       
Movies-TV   Star Wars           
Movies-TV   Star Wars

df_TWO df_TWO

         sport  non_sport   gaming
0     baseball  movies-tv  pokemon
1   basketball      music   yugioh
2     football     people    magic
3       hockey    history   gaming
4       soccer     heroes      NaN
5       racing        NaN      NaN
6       boxing        NaN      NaN
7         golf        NaN      NaN
8          mma        NaN      NaN
9   multisport        NaN      NaN
10      tennis        NaN      NaN
11   wrestling        NaN      NaN
12       poker        NaN      NaN

would be nice to have this result:有这个结果会很高兴：

Heroes      The Punisher        non-sport
Heroes      The Punisher        non-sport
Heroes      Human Torch - 1     non-sport
Heroes      Man Thing           non-sport
Heroes      Medusa              non-sport
Heroes      Mr. Fantastic       non-sport
Movies-TV   Star Wars           non-sport
Movies-TV   Star Wars           non-sport

I've tried to adopt following solutions but had no luck.我试图采用以下解决方案，但没有运气。

keywords.columns[keywords.eq('heroes').any()]关键字.columns[keywords.eq('heroes').any()]
(keywords == 'pokemon').idxmax(axis=1)[0] (keywords == 'pokemon').idxmax(axis=1)[0]

into something like变成类似的东西

df[new_column] = df[category_column].isin(keywords).any() df[new_column] = df[category_column].isin(keywords).any()

Answer 1

You need to reshape your second dataframe.您需要重塑您的第二个 dataframe。 You can do this with melt pretty easily.你可以很容易地用melt来做到这一点。

Here is an example of what the melted df looks like:这是融化的 df 的示例：

    col_match   genre
0   sport   baseball
1   sport   basketball
2   sport   football
3   sport   hockey
4   sport   soccer
5   sport   racing

So you can use the melted df to join the original on the genre.因此，您可以使用融化的 df 加入原始流派。 Be sure to lowercase your genre column in the first df.请务必在第一个 df 中小写您的流派列。

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'genre': ['Heroes',  'Heroes',  'Heroes',  'Heroes',  'Heroes',  'Heroes',  'Movies-TV',  'Movies-TV'],
    ' title': ['The Punisher',  'The Punisher',  'Human Torch - 1',  'Man Thing',  'Medusa',  'Mr. Fantastic',  'Star Wars',  'Star Wars']})

df2 = pd.DataFrame({
    'sport': ['baseball',  'basketball',  'football',  'hockey',  'soccer',  'racing',  'boxing',  'golf',  'mma',  'multisport',  'tennis',  'wrestling',  'poker'],
    'non_sport': ['movies-tv',  'music',  'people',  'history',  'heroes',  np.nan,  np.nan,  np.nan,  np.nan, np.nan,  np.nan,  np.nan,  np.nan],
    'gaming': ['pokemon',  'yugioh',  'magic',  'gaming',  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan]})

df['genre'] = df['genre'].str.lower()

df.merge(df2.melt(value_vars=df2.columns, var_name='col_match', value_name='genre'), on='genre')

Output Output

       genre            title  col_match
0     heroes     The Punisher  non_sport
1     heroes     The Punisher  non_sport
2     heroes  Human Torch - 1  non_sport
3     heroes        Man Thing  non_sport
4     heroes           Medusa  non_sport
5     heroes    Mr. Fantastic  non_sport
6  movies-tv        Star Wars  non_sport
7  movies-tv        Star Wars  non_sport

Pandas 将列的所有值与不同的 DataFrame 进行比较，并返回值匹配的列名（不同的 DataFrame）

问题描述

1 个解决方案

解决方案1
0 2022-01-07 20:01:05

Pandas 将列的所有值与不同的 DataFrame 进行比较，并返回值匹配的列名（不同的 DataFrame）

问题描述

1 个解决方案

解决方案1 0 2022-01-07 20:01:05

解决方案1
0 2022-01-07 20:01:05