[英]How can I split my dataframe into two rows where two columns have specific values?
I have a dataframe which contains information by column such as:我有一个数据框,其中包含按列显示的信息,例如:
Month Year Cost_1 Cost_2
1 2017 100 0
2 2017 0 100
3 2017 140 30
and I am looking to transpose this data so that it takes the form:我希望转置这些数据,使其采用以下形式:
Month Year Cost_1 Cost_2 Type
1 2017 100 0 Cost_1
2 2017 0 100 Cost_2
3 2017 140 0 Cost_1
3 2017 0 30 Cost_2
My initial thought was to use .loc(Cost_1>0,"Type")="Cost_1" but this wouldn't deal with the rows which have both Cost_1 and Cost_2 and need a new row adding?我最初的想法是使用 .loc(Cost_1>0,"Type")="Cost_1" 但这不会处理同时具有 Cost_1 和 Cost_2 并且需要添加新行的行? Should I split the data so that it has only Cost_1 or Cost_2 first and then use .loc to create the Type column or is there a smarter way to do this?
我应该拆分数据以便它首先只有 Cost_1 或 Cost_2 然后使用 .loc 创建 Type 列还是有更聪明的方法来做到这一点?
Edit:编辑:
The problem is actually more complicated than I first thought.这个问题实际上比我最初想象的要复杂。 Each column has an associated partner Cost_1 has Count_1, Cost_2 has Count_2.. etc.
每列都有一个关联的合作伙伴 Cost_1 有 Count_1,Cost_2 有 Count_2 ......等等。
Year Month BDADExclIncurred_Capped_count BDADExclIncurred_Capped_mean BDTPDIncurred_Capped_count BDTPDIncurred_Capped_mean
0 2015 5 0 NaN 60 900
1 2015 10 0 NaN 0 NaN
2 2015 12 0 NaN 0 NaN
3 2016 1 60 2000 0 NaN
4 2016 1 100 1500 20 600
This is how my data looks before, with many columns broken up into count:mean pairs, I want to keep those together but if there is a row with two count:mean pairs I want that to be split into two rows, where each has only one corresponding count:mean pair.这是我的数据之前的样子,许多列被分成计数:均值对,我想将它们保留在一起,但是如果有一行有两个计数:均值对,我希望将其分成两行,其中每行都有只有一个对应的计数:均值对。 Then I wish to create a new column called "type" which tells me what the count:mean pair associated with that row is.
然后我希望创建一个名为“type”的新列,它告诉我与该行关联的 count:mean 对是什么。
Year Month BDADExclIncurred_Capped_count BDADExclIncurred_Capped_mean BDTPDIncurred_Capped_count BDTPDIncurred_Capped_mean Type
0 2015 5 0 NaN 60 900 TPD
1 2015 10 0 NaN 0 NaN
2 2015 12 0 NaN 0 NaN
3 2016 1 60 2000 0 NaN AD
4 2016 1 100 1500 0 0 AD
5 2016 1 0 0 20 600 TPD
As show in this example, a new row is created.如本例所示,创建了一个新行。 Index 4 from the previous dataframe is now split into both index 4 and index 5.
来自前一个数据帧的索引 4 现在被拆分为索引 4 和索引 5。
Assuming either only Cost_1
or Cost_2
are greater than zero, as your example suggests, here's an simple approach to populate Type
with Cost_1
and Cost_2
in one step:假设只有
Cost_1
或Cost_2
大于零,正如您的示例所暗示的那样,这里有一种简单的方法, Cost_2
在一个步骤中使用Cost_1
和Cost_2
填充Type
:
c = ['Cost_1','Cost_2']
counts = df[c].gt(0).dot(df[c].columns + ',').str.rstrip(',').str.split(',')
counts_df = pd.DataFrame(counts.tolist(), columns = ['Count_1', 'Count_2'])
df.assign(**counts_df)
Month Year Cost_1 Count_1 Cost_2 Count_2
0 1 2017 100 Cost_1 0 0
1 2 2017 0 Cost_2 100 0
2 3 2017 140 Cost_1 30 Cost_2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.