在 Pandas df 行中查找值并返回列名

Question

There's probably an easy way to do this, but I hit a wall on this one.可能有一种简单的方法可以做到这一点，但我在这个问题上碰壁了。

I have a dataframe with text as the row data.我有一个 dataframe 与文本作为行数据。 I'm trying to add new columns to the dataframe based on existing column names.我正在尝试根据现有列名向 dataframe 添加新列。

test_data = {
    'Col1' : ['Boy', 'Boy', 'Boy', 'Boy', 'Boy'],
    'Col2' : ['Girl', 'Girl', 'Girl', 'Baseball', 'Girl'],
    'Col3' : ['Baseball', 'Baseball', 'Baseball', 'Lizard', 'Baseball'],
    'Col4' : ['Lizard', 'Hockey', 'Lizard', 'Girl', 'Hockey']
}

df = pd.DataFrame(test_data, columns = ['Col1', 'Col2', 'Col3', 'Col4'])
print(df)

  Col1      Col2      Col3    Col4
0  Boy      Girl  Baseball  Lizard
1  Boy      Girl  Baseball  Hockey
2  Boy      Girl  Baseball  Lizard
3  Boy  Baseball    Lizard    Girl
4  Boy      Girl  Baseball  Hockey

# Create new columns with locations
for col in ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey']:
    df[col] = 99
    
print(df)

  Col1      Col2      Col3    Col4  Boy  Girl  Lizard  Baseball  Hockey
0  Boy      Girl  Baseball  Lizard   99    99      99        99      99
1  Boy      Girl  Baseball  Hockey   99    99      99        99      99
2  Boy      Girl  Baseball  Lizard   99    99      99        99      99
3  Boy  Baseball    Lizard    Girl   99    99      99        99      99
4  Boy      Girl  Baseball  Hockey   99    99      99        99      99

What I'd like it to do is the below.我想做的是下面的。 If it matters, a string can only appear once per row, and can also not show up.如果重要的话，一个字符串每行只能出现一次，也不能出现。 I found a method using argsort, but that doesn't help with strings.我找到了一种使用 argsort 的方法，但这对字符串没有帮助。 Thanks very much.非常感谢。

answers = {
    'Boy' : ['Col1', 'Col1', 'Col1', 'Col1', 'Col1'],
    'Girl' : ['Col2', 'Col2', 'Col2', 'Col4', 'Col2'],
    'Lizard' : ['Col4', 0, 'Col4', 'Col3', 0],
    'Baseball' : ['Col3', 'Col3', 'Col3', 'Col2', 'Col3'],
    'Hockey' : [ 0, 'Col4', 0, 0, 'Col4']
}
df_answers = pd.DataFrame(answers, columns = ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey'])
print(df_answers)

    Boy  Girl Lizard Baseball Hockey
0  Col1  Col2   Col4     Col3      0
1  Col1  Col2      0     Col3   Col4
2  Col1  Col2   Col4     Col3      0
3  Col1  Col4   Col3     Col2      0
4  Col1  Col2      0     Col3   Col4

Answer 1

Let's do:让我们做：

s = df.stack().reset_index(name='var')
s.pivot('level_0', 'var', 'level_1').rename_axis(index=None, columns=None)

Details:细节：

.stack the dataframe and reset_index : .stack dataframe 和reset_index ：

    level_0 level_1       var
0         0    Col1       Boy
1         0    Col2      Girl
2         0    Col3  Baseball
3         0    Col4    Lizard
4         1    Col1       Boy
5         1    Col2      Girl
6         1    Col3  Baseball
7         1    Col4    Hockey
8         2    Col1       Boy
9         2    Col2      Girl
10        2    Col3  Baseball
11        2    Col4    Lizard
12        3    Col1       Boy
13        3    Col2  Baseball
14        3    Col3    Lizard
15        3    Col4      Girl
16        4    Col1       Boy
17        4    Col2      Girl
18        4    Col3  Baseball
19        4    Col4    Hockey

.pivot the above stacked frame to reshape it into a new dataframe having it's index as level_0 , columns as var and values as level_1 : .pivot将上面的堆叠框架重塑为新的 dataframe ，其索引为level_0 ，列为var ，值为level_1 ：

  Baseball   Boy  Girl Hockey Lizard
0     Col3  Col1  Col2    NaN   Col4
1     Col3  Col1  Col2   Col4    NaN
2     Col3  Col1  Col2    NaN   Col4
3     Col2  Col1  Col4    NaN   Col3
4     Col3  Col1  Col2   Col4    NaN

在 Pandas df 行中查找值并返回列名

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-17 18:16:44

在 Pandas df 行中查找值并返回列名

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-17 18:16:44

解决方案1
1 已采纳 2021-01-17 18:16:44