简体   繁体   English

在 Pandas df 行中查找值并返回列名

[英]Find value in Pandas df row and return the column name

There's probably an easy way to do this, but I hit a wall on this one.可能有一种简单的方法可以做到这一点,但我在这个问题上碰壁了。

I have a dataframe with text as the row data.我有一个 dataframe 与文本作为行数据。 I'm trying to add new columns to the dataframe based on existing column names.我正在尝试根据现有列名向 dataframe 添加新列。

test_data = {
    'Col1' : ['Boy', 'Boy', 'Boy', 'Boy', 'Boy'],
    'Col2' : ['Girl', 'Girl', 'Girl', 'Baseball', 'Girl'],
    'Col3' : ['Baseball', 'Baseball', 'Baseball', 'Lizard', 'Baseball'],
    'Col4' : ['Lizard', 'Hockey', 'Lizard', 'Girl', 'Hockey']
}

df = pd.DataFrame(test_data, columns = ['Col1', 'Col2', 'Col3', 'Col4'])
print(df)

  Col1      Col2      Col3    Col4
0  Boy      Girl  Baseball  Lizard
1  Boy      Girl  Baseball  Hockey
2  Boy      Girl  Baseball  Lizard
3  Boy  Baseball    Lizard    Girl
4  Boy      Girl  Baseball  Hockey

# Create new columns with locations
for col in ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey']:
    df[col] = 99
    
print(df)

  Col1      Col2      Col3    Col4  Boy  Girl  Lizard  Baseball  Hockey
0  Boy      Girl  Baseball  Lizard   99    99      99        99      99
1  Boy      Girl  Baseball  Hockey   99    99      99        99      99
2  Boy      Girl  Baseball  Lizard   99    99      99        99      99
3  Boy  Baseball    Lizard    Girl   99    99      99        99      99
4  Boy      Girl  Baseball  Hockey   99    99      99        99      99

What I'd like it to do is the below.我想做的是下面的。 If it matters, a string can only appear once per row, and can also not show up.如果重要的话,一个字符串每行只能出现一次,也不能出现。 I found a method using argsort, but that doesn't help with strings.我找到了一种使用 argsort 的方法,但这对字符串没有帮助。 Thanks very much.非常感谢。

answers = {
    'Boy' : ['Col1', 'Col1', 'Col1', 'Col1', 'Col1'],
    'Girl' : ['Col2', 'Col2', 'Col2', 'Col4', 'Col2'],
    'Lizard' : ['Col4', 0, 'Col4', 'Col3', 0],
    'Baseball' : ['Col3', 'Col3', 'Col3', 'Col2', 'Col3'],
    'Hockey' : [ 0, 'Col4', 0, 0, 'Col4']
}
df_answers = pd.DataFrame(answers, columns = ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey'])
print(df_answers)

    Boy  Girl Lizard Baseball Hockey
0  Col1  Col2   Col4     Col3      0
1  Col1  Col2      0     Col3   Col4
2  Col1  Col2   Col4     Col3      0
3  Col1  Col4   Col3     Col2      0
4  Col1  Col2      0     Col3   Col4

Let's do:让我们做:

s = df.stack().reset_index(name='var')
s.pivot('level_0', 'var', 'level_1').rename_axis(index=None, columns=None)

Details:细节:

.stack the dataframe and reset_index : .stack dataframe 和reset_index

    level_0 level_1       var
0         0    Col1       Boy
1         0    Col2      Girl
2         0    Col3  Baseball
3         0    Col4    Lizard
4         1    Col1       Boy
5         1    Col2      Girl
6         1    Col3  Baseball
7         1    Col4    Hockey
8         2    Col1       Boy
9         2    Col2      Girl
10        2    Col3  Baseball
11        2    Col4    Lizard
12        3    Col1       Boy
13        3    Col2  Baseball
14        3    Col3    Lizard
15        3    Col4      Girl
16        4    Col1       Boy
17        4    Col2      Girl
18        4    Col3  Baseball
19        4    Col4    Hockey

.pivot the above stacked frame to reshape it into a new dataframe having it's index as level_0 , columns as var and values as level_1 : .pivot将上面的堆叠框架重塑为新的 dataframe ,其索引为level_0 ,列为var ,值为level_1

  Baseball   Boy  Girl Hockey Lizard
0     Col3  Col1  Col2    NaN   Col4
1     Col3  Col1  Col2   Col4    NaN
2     Col3  Col1  Col2    NaN   Col4
3     Col2  Col1  Col4    NaN   Col3
4     Col3  Col1  Col2   Col4    NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何遍历 Pandas DF 中的列以检查某个值并返回同一行但来自不同列的值? - How to iterate over a column in a Pandas DF to check for a certain value and return a value in the same row but from a different column? 将 Pandas df1 的每一行与 df2 中的每一行进行比较,并从最接近的匹配列返回字符串值 - Compare each row of Pandas df1 with every row within df2 and return string value from closest matching column 使用 Pandas 查找列的最大值并返回相应的行值 - Find maximum value of a column and return the corresponding row values using Pandas 如何编写 Python 代码来查找特定行值的 Pandas DF 中列的值的总和? - How can I write the Python code to find the sum of values of a column in a Pandas DF for a specific row value? 在一个 Pandas DataFrame 中找到每一行第二大值的列名 - Find the column name of the second largest value of each row in a Pandas DataFrame 从行中删除任何 0 值,为行降序排列值,对于行中的每个非 0 值,将索引、列名和分数返回到新的 df - Remove any 0 value from row, order values descending for row, for each non 0 value in row return the index, column name, and score to a new df Pandas按行查找第一个nan值并返回列名 - Pandas find first nan value by rows and return column name 熊猫在df中找到与上一行具有相同值的最后一行 - pandas find the last row with the same value as the previous row in a df 在pandas df中返回列名称的最有效方法 - Most efficient way to return Column name in a pandas df Pandas - 对于一行中的给定列值,返回名称与值匹配的列中的值 - Pandas - For a given column value in a row return the value from the column which name matches the value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM