简体   繁体   English

如何重组数据框以根据条件将列值转换为行值

[英]How to restructure dataframe to convert column values to row values based on condition

I have a dataframe with 5 columns and want to convert 2 of the columns (Chemo and Surgery) based on their values (greater than 0) to rows (diagnosis series) and add the information like the individual id and diagnosis at age to the rows. 我有一个包含5列的数据框,并希望根据其值(大于0)将其中的2列(Chemo和Surgery)转换为行(诊断系列),然后将诸如个体ID和年龄诊断的信息添加到各行。

Here is my dataframe 这是我的数据框

import pandas as pd

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]

df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df 

I have tried to get the values where the Chemo/Surgery is greater than 0 but when I tried to add it as a row, it doesn't work. 我尝试获取Chemo / Surgery大于0的值,但是当我尝试将其作为一行添加时,它不起作用。

This is what I want the end result to be. 这就是我想要的最终结果。

ID     Diagnosis Age at Diagnosis
0   A-1         Birth                0
1   A-1   Lung cancer               25
2   A-1         Chemo               25
3   A-1       Surgery               25
4   A-1         Death               50
5   A-2         Birth                0
6   A-2  Brain cancer               12
7   A-2         Chemo               12
8   A-2   Skin cancer               20
9   A-2         Chemo               20
10  A-2       Surgery               20
11  A-2   Current age               23
12  A-3         Birth                0
13  A-3  Brain cancer               30
14  A-3       Surgery               30
15  A-3   Lung cancer               33
16  A-3         Chemo               33
17  A-3   Current age               35

This is one of the things I have tried: 这是我尝试过的事情之一:

chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df

We can melt the two columns Chemo and Surgery , then drop all the zero and concat back: 我们可以融化ChemoSurgery的两列,然后将所有零放回并concat

# melt the two columns
new_df = df[['ID', 'Chemo', 'Surgery']].melt(id_vars='ID', 
                                             value_name='Age at Diagnosis',
                                             var_name='Diagnosis')
# filter out the zeros
new_df = new_df[new_df['Age at Diagnosis'].ne('0')]

# concat with the original dataframe, ignoring the extra columns
new_df = pd.concat((df,new_df), sort=False, join='inner')

# sort values
new_df.sort_values(['ID','Age at Diagnosis'])

Output: 输出:

    ID      Diagnosis   Age at Diagnosis
0   A-1     Birth           0
1   A-1     Lung cancer     25
1   A-1     Chemo           25
12  A-1     Surgery         25
2   A-1     Death           50
3   A-2     Birth           0
4   A-2     Brain cancer    12
4   A-2     Chemo           12
5   A-2     Skin cancer     20
5   A-2     Chemo           20
16  A-2     Surgery         20
6   A-2     Current age     23
7   A-3     Birth           0
8   A-3     Brain cancer    30
19  A-3     Surgery         30
9   A-3     Lung cancer     33
9   A-3     Chemo           33
10  A-3     Current age     35

This attempt is pretty verbose and takes a few steps. 此尝试非常冗长,需要执行一些步骤。 WE can't do a simple pivot or index/column stacking because we need to modify one column with partial results from another. 我们无法进行简单的数据透视或索引/列堆叠,因为我们需要用另一列的部分结果来修改一列。 This requires splitting and appending. 这需要拆分和追加。

Firstly, convert your dataframe into dtypes we can work with. 首先,将您的数据框转换为我们可以使用的dtype。

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])

df[["Age at Diagnosis", "Chemo", "Surgery"]] = df[["Age at Diagnosis", "Chemo", "Surgery"]].astype(int)

Now we split the thing up into bits and pieces. 现在,我们将事情分解成碎片。

# I like making a copy or resetting an index so that 
# pandas is not operating off a slice
df_chemo = df[df.Chemo > 0].copy()
df_surgery = df[df.Surgery > 0].copy()

# drop columns you don't need
df_chemo.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df_surgery.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df.drop(["Chemo", "Surgery"], axis=1, inplace=True)

# Set Chemo and Surgery Diagnosis
df_chemo.Diagnosis = "Chemo"
df_surgery.Diagnosis = "Surgery"

Then append everything together. 然后将所有内容附加在一起。 You can do this because the column dimensions match. 您可以这样做,因为列尺寸匹配。

df_new = df.append(df_chemo).append(df_surgery)
# make it look pretty
df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用基于条件的值将 append 列到 dataframe - How to append a column to a dataframe with values based on condition 如何根据第二个 Dataframe 值的条件替换 Dataframe 列值 - How to Replace Dataframe Column Values Based on Condition of Second Dataframe Values 如何重构数据框以基于Column [se]值创建新的列标签,然后使用Column [value]值填充这些新列 - How can I restructure a dataframe to create new column labels based on Column[se] values and then populate those new columns with Column[value] Values 如何根据列值和行值重构 dataframe? - How to reframe the dataframe based on column and row values? 根据条件复制 Pandas 数据框中的行并更改特定列的值 - Replicate row in Pandas dataframe based on condition and change values for a specific column 根据 python dataframe 中的特定条件减去特定列的行值 - Subtracting values of a row for a specific column based on a specific condition in python dataframe 如何根据具有一系列值的条件替换 pd 数据框列中的值? - How to Replace values in a pd dataframe column based on a condition with a range of values? 如何根据条件和另一行的值将 function 应用于 dataframe 行? - How to apply a function to a dataframe row based on a condition and values of another row? 大熊猫:根据列值重构数据框 - Pandas: Restructure dataframe from column values 如何根据条件从数据框列名称填充列值? - How to fill column values from dataframe column names based on condition?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM