[英]Find value in Pandas df row and return the column name
There's probably an easy way to do this, but I hit a wall on this one.可能有一种简单的方法可以做到这一点,但我在这个问题上碰壁了。
I have a dataframe with text as the row data.我有一个 dataframe 与文本作为行数据。 I'm trying to add new columns to the dataframe based on existing column names.
我正在尝试根据现有列名向 dataframe 添加新列。
test_data = {
'Col1' : ['Boy', 'Boy', 'Boy', 'Boy', 'Boy'],
'Col2' : ['Girl', 'Girl', 'Girl', 'Baseball', 'Girl'],
'Col3' : ['Baseball', 'Baseball', 'Baseball', 'Lizard', 'Baseball'],
'Col4' : ['Lizard', 'Hockey', 'Lizard', 'Girl', 'Hockey']
}
df = pd.DataFrame(test_data, columns = ['Col1', 'Col2', 'Col3', 'Col4'])
print(df)
Col1 Col2 Col3 Col4
0 Boy Girl Baseball Lizard
1 Boy Girl Baseball Hockey
2 Boy Girl Baseball Lizard
3 Boy Baseball Lizard Girl
4 Boy Girl Baseball Hockey
# Create new columns with locations
for col in ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey']:
df[col] = 99
print(df)
Col1 Col2 Col3 Col4 Boy Girl Lizard Baseball Hockey
0 Boy Girl Baseball Lizard 99 99 99 99 99
1 Boy Girl Baseball Hockey 99 99 99 99 99
2 Boy Girl Baseball Lizard 99 99 99 99 99
3 Boy Baseball Lizard Girl 99 99 99 99 99
4 Boy Girl Baseball Hockey 99 99 99 99 99
What I'd like it to do is the below.我想做的是下面的。 If it matters, a string can only appear once per row, and can also not show up.
如果重要的话,一个字符串每行只能出现一次,也不能出现。 I found a method using argsort, but that doesn't help with strings.
我找到了一种使用 argsort 的方法,但这对字符串没有帮助。 Thanks very much.
非常感谢。
answers = {
'Boy' : ['Col1', 'Col1', 'Col1', 'Col1', 'Col1'],
'Girl' : ['Col2', 'Col2', 'Col2', 'Col4', 'Col2'],
'Lizard' : ['Col4', 0, 'Col4', 'Col3', 0],
'Baseball' : ['Col3', 'Col3', 'Col3', 'Col2', 'Col3'],
'Hockey' : [ 0, 'Col4', 0, 0, 'Col4']
}
df_answers = pd.DataFrame(answers, columns = ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey'])
print(df_answers)
Boy Girl Lizard Baseball Hockey
0 Col1 Col2 Col4 Col3 0
1 Col1 Col2 0 Col3 Col4
2 Col1 Col2 Col4 Col3 0
3 Col1 Col4 Col3 Col2 0
4 Col1 Col2 0 Col3 Col4
Let's do:让我们做:
s = df.stack().reset_index(name='var')
s.pivot('level_0', 'var', 'level_1').rename_axis(index=None, columns=None)
Details:细节:
.stack
the dataframe and reset_index
: .stack
dataframe 和reset_index
:
level_0 level_1 var
0 0 Col1 Boy
1 0 Col2 Girl
2 0 Col3 Baseball
3 0 Col4 Lizard
4 1 Col1 Boy
5 1 Col2 Girl
6 1 Col3 Baseball
7 1 Col4 Hockey
8 2 Col1 Boy
9 2 Col2 Girl
10 2 Col3 Baseball
11 2 Col4 Lizard
12 3 Col1 Boy
13 3 Col2 Baseball
14 3 Col3 Lizard
15 3 Col4 Girl
16 4 Col1 Boy
17 4 Col2 Girl
18 4 Col3 Baseball
19 4 Col4 Hockey
.pivot
the above stacked frame to reshape it into a new dataframe having it's index as level_0
, columns as var
and values as level_1
: .pivot
将上面的堆叠框架重塑为新的 dataframe ,其索引为level_0
,列为var
,值为level_1
:
Baseball Boy Girl Hockey Lizard
0 Col3 Col1 Col2 NaN Col4
1 Col3 Col1 Col2 Col4 NaN
2 Col3 Col1 Col2 NaN Col4
3 Col2 Col1 Col4 NaN Col3
4 Col3 Col1 Col2 Col4 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.