如何重组数据框以根据条件将列值转换为行值

Question

I have a dataframe with 5 columns and want to convert 2 of the columns (Chemo and Surgery) based on their values (greater than 0) to rows (diagnosis series) and add the information like the individual id and diagnosis at age to the rows. 我有一个包含5列的数据框，并希望根据其值（大于0）将其中的2列（Chemo和Surgery）转换为行（诊断系列），然后将诸如个体ID和年龄诊断的信息添加到各行。

Here is my dataframe 这是我的数据框

import pandas as pd

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]

df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df

I have tried to get the values where the Chemo/Surgery is greater than 0 but when I tried to add it as a row, it doesn't work. 我尝试获取Chemo / Surgery大于0的值，但是当我尝试将其作为一行添加时，它不起作用。

This is what I want the end result to be. 这就是我想要的最终结果。

ID     Diagnosis Age at Diagnosis
0   A-1         Birth                0
1   A-1   Lung cancer               25
2   A-1         Chemo               25
3   A-1       Surgery               25
4   A-1         Death               50
5   A-2         Birth                0
6   A-2  Brain cancer               12
7   A-2         Chemo               12
8   A-2   Skin cancer               20
9   A-2         Chemo               20
10  A-2       Surgery               20
11  A-2   Current age               23
12  A-3         Birth                0
13  A-3  Brain cancer               30
14  A-3       Surgery               30
15  A-3   Lung cancer               33
16  A-3         Chemo               33
17  A-3   Current age               35

This is one of the things I have tried: 这是我尝试过的事情之一：

chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df

Answer 1

We can melt the two columns Chemo and Surgery , then drop all the zero and concat back: 我们可以融化Chemo和Surgery的两列，然后将所有零放回并concat ：

# melt the two columns
new_df = df[['ID', 'Chemo', 'Surgery']].melt(id_vars='ID', 
                                             value_name='Age at Diagnosis',
                                             var_name='Diagnosis')
# filter out the zeros
new_df = new_df[new_df['Age at Diagnosis'].ne('0')]

# concat with the original dataframe, ignoring the extra columns
new_df = pd.concat((df,new_df), sort=False, join='inner')

# sort values
new_df.sort_values(['ID','Age at Diagnosis'])

Output: 输出：

    ID      Diagnosis   Age at Diagnosis
0   A-1     Birth           0
1   A-1     Lung cancer     25
1   A-1     Chemo           25
12  A-1     Surgery         25
2   A-1     Death           50
3   A-2     Birth           0
4   A-2     Brain cancer    12
4   A-2     Chemo           12
5   A-2     Skin cancer     20
5   A-2     Chemo           20
16  A-2     Surgery         20
6   A-2     Current age     23
7   A-3     Birth           0
8   A-3     Brain cancer    30
19  A-3     Surgery         30
9   A-3     Lung cancer     33
9   A-3     Chemo           33
10  A-3     Current age     35

Answer 2

This attempt is pretty verbose and takes a few steps. 此尝试非常冗长，需要执行一些步骤。 WE can't do a simple pivot or index/column stacking because we need to modify one column with partial results from another. 我们无法进行简单的数据透视或索引/列堆叠，因为我们需要用另一列的部分结果来修改一列。 This requires splitting and appending. 这需要拆分和追加。

Firstly, convert your dataframe into dtypes we can work with. 首先，将您的数据框转换为我们可以使用的dtype。

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])

df[["Age at Diagnosis", "Chemo", "Surgery"]] = df[["Age at Diagnosis", "Chemo", "Surgery"]].astype(int)

Now we split the thing up into bits and pieces. 现在，我们将事情分解成碎片。

# I like making a copy or resetting an index so that 
# pandas is not operating off a slice
df_chemo = df[df.Chemo > 0].copy()
df_surgery = df[df.Surgery > 0].copy()

# drop columns you don't need
df_chemo.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df_surgery.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df.drop(["Chemo", "Surgery"], axis=1, inplace=True)

# Set Chemo and Surgery Diagnosis
df_chemo.Diagnosis = "Chemo"
df_surgery.Diagnosis = "Surgery"

Then append everything together. 然后将所有内容附加在一起。 You can do this because the column dimensions match. 您可以这样做，因为列尺寸匹配。

df_new = df.append(df_chemo).append(df_surgery)
# make it look pretty
df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)

如何重组数据框以根据条件将列值转换为行值

问题描述

2 个解决方案

解决方案1
3 2019-06-04 21:31:33

解决方案2
1 2019-06-04 21:04:35

如何重组数据框以根据条件将列值转换为行值

问题描述

2 个解决方案

解决方案1 3 2019-06-04 21:31:33

解决方案2 1 2019-06-04 21:04:35

解决方案1
3 2019-06-04 21:31:33

解决方案2
1 2019-06-04 21:04:35