简体   繁体   English

Pandas 将列的所有值与不同的 DataFrame 进行比较,并返回值匹配的列名(不同的 DataFrame)

[英]Pandas compare all values of a column with different DataFrame and return column name (of a dif. DataFrame) where value matches

I'm trying to assign category per row of a dataset based on matching keywords from other dataset.我正在尝试根据来自其他数据集的匹配关键字为数据集的每行分配类别。

  1. I compare df_ONE['columnname'] to every value of df_TWO我将 df_ONE['columnname'] 与 df_TWO 的每个值进行比较
  2. and if matching value found use column name of df_TWO where this value is located as a cell value of new column in df_ONE.如果找到匹配值,则使用 df_TWO 的列名,其中该值位于 df_ONE 中作为新列的单元格值。

With example below, all the values of a new string would be non-sport (column name of df_TWO where the value is found)使用下面的示例,新字符串的所有值都将是非运动的(找到值的 df_TWO 的列名)

df_ONE df_ONE

Heroes      The Punisher        
Heroes      The Punisher        
Heroes      Human Torch - 1     
Heroes      Man Thing           
Heroes      Medusa              
Heroes      Mr. Fantastic       
Movies-TV   Star Wars           
Movies-TV   Star Wars

df_TWO df_TWO

         sport  non_sport   gaming
0     baseball  movies-tv  pokemon
1   basketball      music   yugioh
2     football     people    magic
3       hockey    history   gaming
4       soccer     heroes      NaN
5       racing        NaN      NaN
6       boxing        NaN      NaN
7         golf        NaN      NaN
8          mma        NaN      NaN
9   multisport        NaN      NaN
10      tennis        NaN      NaN
11   wrestling        NaN      NaN
12       poker        NaN      NaN

would be nice to have this result:有这个结果会很高兴:

Heroes      The Punisher        non-sport
Heroes      The Punisher        non-sport
Heroes      Human Torch - 1     non-sport
Heroes      Man Thing           non-sport
Heroes      Medusa              non-sport
Heroes      Mr. Fantastic       non-sport
Movies-TV   Star Wars           non-sport
Movies-TV   Star Wars           non-sport

I've tried to adopt following solutions but had no luck.我试图采用以下解决方案,但没有运气。

  • keywords.columns[keywords.eq('heroes').any()]关键字.columns[keywords.eq('heroes').any()]
  • (keywords == 'pokemon').idxmax(axis=1)[0] (keywords == 'pokemon').idxmax(axis=1)[0]

into something like变成类似的东西

  • df[new_column] = df[category_column].isin(keywords).any() df[new_column] = df[category_column].isin(keywords).any()

You need to reshape your second dataframe.您需要重塑您的第二个 dataframe。 You can do this with melt pretty easily.你可以很容易地用melt来做到这一点。

Here is an example of what the melted df looks like:这是融化的 df 的示例:

    col_match   genre
0   sport   baseball
1   sport   basketball
2   sport   football
3   sport   hockey
4   sport   soccer
5   sport   racing

So you can use the melted df to join the original on the genre.因此,您可以使用融化的 df 加入原始流派。 Be sure to lowercase your genre column in the first df.请务必在第一个 df 中小写您的流派列。

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'genre': ['Heroes',  'Heroes',  'Heroes',  'Heroes',  'Heroes',  'Heroes',  'Movies-TV',  'Movies-TV'],
    ' title': ['The Punisher',  'The Punisher',  'Human Torch - 1',  'Man Thing',  'Medusa',  'Mr. Fantastic',  'Star Wars',  'Star Wars']})

df2 = pd.DataFrame({
    'sport': ['baseball',  'basketball',  'football',  'hockey',  'soccer',  'racing',  'boxing',  'golf',  'mma',  'multisport',  'tennis',  'wrestling',  'poker'],
    'non_sport': ['movies-tv',  'music',  'people',  'history',  'heroes',  np.nan,  np.nan,  np.nan,  np.nan, np.nan,  np.nan,  np.nan,  np.nan],
    'gaming': ['pokemon',  'yugioh',  'magic',  'gaming',  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan,  np.nan]})

df['genre'] = df['genre'].str.lower()

df.merge(df2.melt(value_vars=df2.columns, var_name='col_match', value_name='genre'), on='genre')

Output Output

       genre            title  col_match
0     heroes     The Punisher  non_sport
1     heroes     The Punisher  non_sport
2     heroes  Human Torch - 1  non_sport
3     heroes        Man Thing  non_sport
4     heroes           Medusa  non_sport
5     heroes    Mr. Fantastic  non_sport
6  movies-tv        Star Wars  non_sport
7  movies-tv        Star Wars  non_sport

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果列名与另一个 DataFrame pandas 的行值匹配,则获取 DataFrame 的列值 - Get column values of a DataFrame if column name matches row value of another DataFrame pandas 在行匹配条件的Pandas DataFrame中获取第一列值 - Get first column value in Pandas DataFrame where row matches condition Pandas dataframe 比较没有列名引用的索引的所有列值 - Pandas dataframe compare all column values of index without column-name reference 查找包含整数列的 pandas dataframe 与包含整数列表列的 dataframe 匹配的所有实例 - Find all instances where a pandas dataframe containing a column of int's matches a dataframe with a column of a list of ints 比较一列中的float值与pandas DataFrame中的所有其他列 - Compare float values in one column with all other columns in a pandas DataFrame 比较2个不同长度的pandas dataframe得到值变化的列名 - Compare 2 pandas dataframe with different lengths and get the column name which has changed value 如果数据框中的另一列使用pandas匹配某个值,则从数据框中的列中减去值 - substract values from column in dataframe if another column in dataframe matches some value using pandas Pandas:将列与数据帧的所有其他列进行比较 - Pandas: Compare a column to all other columns of a dataframe 合并pandas DataFrame并返回带有列名的公共值 - Merge pandas DataFrame and return common values with the column name 返回 pandas dataframe 中列中的元组包含特定值的行 - Return rows in pandas dataframe where tuple in column contains a certain value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM