[英]Make new dataframe from existing dataframe columns
I have one Dataframe1
from which I need to form new Dataframe2
as given below.我有一个
Dataframe1
,我需要从中形成新的Dataframe2
,如下所示。 Column n1
will pick the value from Status
column in Dataframe1
if Dataframe1.name = A
and similarly column n2
will take the value from column Status
if Dataframe1.Name = B
.如果
Dataframe1.name = A
列n1
将从Dataframe1
Status
列中选择值,类似地,如果Dataframe1.Name = B
列n2
将从列Status
获取值。 Also, Timestamp
and id
will have unique values.此外,
Timestamp
和id
将具有唯一值。 Can anybody please help?有人可以帮忙吗?
Input Dataframe1
:输入数据
Dataframe1
:
id ![]() |
Timestamp![]() |
Name![]() |
Status![]() |
---|---|---|---|
1 ![]() |
02:15:00 ![]() |
A![]() |
FALSE![]() |
1 ![]() |
02:15:00 ![]() |
B![]() |
TRUE![]() |
2 ![]() |
03:00:00 ![]() |
A![]() |
TRUE![]() |
2 ![]() |
03:00:00 ![]() |
B![]() |
FALSE![]() |
Output Dataframe2
:输出数据
Dataframe2
:
id ![]() |
Timestamp![]() |
n1 ![]() |
n2 ![]() |
---|---|---|---|
1 ![]() |
02:15:00 ![]() |
FALSE![]() |
TRUE![]() |
2 ![]() |
03:00:00 ![]() |
TRUE![]() |
FALSE![]() |
What you are trying to do is taking a pivot of the data with special names.您要做的是对具有特殊名称的数据进行透视。 If you rename
A
and b
values as you want to n1
and n2
only thing you have to do is to use the pandas.pivot_table
function.如果您将
A
和b
值重命名为n1
和n2
那么您唯一要做的就是使用pandas.pivot_table
函数。 because as its aggregation function it uses mean strings don't work out of the box.因为作为它的聚合函数,它使用平均字符串不能开箱即用。 You have to provide your own aggregation function.
您必须提供自己的聚合函数。 Because in our situation every row is unique we can just give the aggregation function to take the value of that row.
因为在我们的情况下,每一行都是唯一的,我们可以给聚合函数以获取该行的值。
dataframe1['Name'] = dataframe1['Name'].replace({'A': 'n1', 'b': 'n2'})
dataframe1.pivot_table(index=['id', 'Timestamp'],
columns='Name',
values='Status',
aggfunc=lambda x:x).reset_index()
You can use pandas.pivot_table
:您可以使用
pandas.pivot_table
:
df2 = df.pivot_table(index=['id','Timestamp'], columns='Name', values='Status').reset_index().set_index('id')
df2.columns = ['Timestamp','n1','n2']
Output:输出:
>>> df2
Timestamp n1 n2
id
1 02:15:00 FALSE TRUE
2 03:00:00 TRUE FALSE
using pivot_table and then adjusting the result header.使用 pivot_table 然后调整结果标题。
import pandas as pd
df = pd.read_excel('test.xls', index_col = False)
df2 = df.pivot_table(index = ['id', 'Timestamp'], columns = 'Name', values = 'Status').reset_index().rename_axis(None, axis=1).rename(columns = {'A': 'n1', 'B': 'n2'})
print(df2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.