简体   繁体   English

从现有数据框列创建新数据框

[英]Make new dataframe from existing dataframe columns

I have one Dataframe1 from which I need to form new Dataframe2 as given below.我有一个Dataframe1 ,我需要从中形成新的Dataframe2 ,如下所示。 Column n1 will pick the value from Status column in Dataframe1 if Dataframe1.name = A and similarly column n2 will take the value from column Status if Dataframe1.Name = B .如果Dataframe1.name = An1将从Dataframe1 Status列中选择值,类似地,如果Dataframe1.Name = Bn2将从列Status获取值。 Also, Timestamp and id will have unique values.此外, Timestampid将具有唯一值。 Can anybody please help?有人可以帮忙吗?

Input Dataframe1 :输入数据Dataframe1

id ID Timestamp时间戳 Name姓名 Status地位
1 1 02:15:00 02:15:00 A一种 FALSE错误的
1 1 02:15:00 02:15:00 B TRUE真的
2 2 03:00:00 03:00:00 A一种 TRUE真的
2 2 03:00:00 03:00:00 B FALSE错误的

Output Dataframe2 :输出数据Dataframe2

id ID Timestamp时间戳 n1 n1 n2 n2
1 1 02:15:00 02:15:00 FALSE错误的 TRUE真的
2 2 03:00:00 03:00:00 TRUE真的 FALSE错误的

What you are trying to do is taking a pivot of the data with special names.您要做的是对具有特殊名称的数据进行透视。 If you rename A and b values as you want to n1 and n2 only thing you have to do is to use the pandas.pivot_table function.如果您将Ab值重命名为n1n2那么您唯一要做的就是使用pandas.pivot_table函数。 because as its aggregation function it uses mean strings don't work out of the box.因为作为它的聚合函数,它使用平均字符串不能开箱即用。 You have to provide your own aggregation function.您必须提供自己的聚合函数。 Because in our situation every row is unique we can just give the aggregation function to take the value of that row.因为在我们的情况下,每一行都是唯一的,我们可以给聚合函数以获取该行的值。

dataframe1['Name'] = dataframe1['Name'].replace({'A': 'n1', 'b': 'n2'})
dataframe1.pivot_table(index=['id', 'Timestamp'], 
                       columns='Name', 
                       values='Status', 
                       aggfunc=lambda x:x).reset_index()

You can use pandas.pivot_table :您可以使用pandas.pivot_table

df2 = df.pivot_table(index=['id','Timestamp'], columns='Name', values='Status').reset_index().set_index('id')
df2.columns = ['Timestamp','n1','n2']

Output:输出:

>>> df2
    Timestamp   n1     n2
id  
1   02:15:00    FALSE   TRUE
2   03:00:00    TRUE    FALSE

using pivot_table and then adjusting the result header.使用 pivot_table 然后调整结果标题。

import pandas as pd
df = pd.read_excel('test.xls', index_col = False)

df2 = df.pivot_table(index = ['id', 'Timestamp'], columns = 'Name', values = 'Status').reset_index().rename_axis(None, axis=1).rename(columns = {'A': 'n1', 'B': 'n2'})


print(df2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM