简体   繁体   中英

Pivoting tables with duplicated data in pandas

This question is closely related to the one I asked here , but it is not same.

My question is, suppose I have stacked data and I want pivot this table by Date and Name. My question is, when there are ambiguous values (more than one value for the same Date and Name) how can I choose between them ?

Name Date         Value 
a    2016-05-01   5
b    2016-05-01   7
c    2016-05-01   7
c    2016-05-01   8
a    2016-06-01   4
b    2016-06-01   3
c    2016-06-01   2
d    2016-06-01   2

Ideally I would like to "choose" the final value based on another column. For example suppose my dataset is:

Name Date         Value   PublishDate
a    2016-05-01   5       2016-04-01    
b    2016-05-01   7       2016-04-01
c    2016-05-01   7       2016-03-01
c    2016-05-01   8       2016-04-01
a    2016-06-01   4       2016-04-01
b    2016-06-01   3       2016-04-01
c    2016-06-01   2       2016-04-01
d    2016-06-01   2       2016-04-01

Then, between the [c,2016-05-01] duplicates, I'd like the one that was published last to stay (PublishDate=2016-04-01) and the other one to be discarded.

Is is possible ?

Ultimately I need this output, so that I can perform matrix operations on it.

Data/Name    a   b   c   d
2016-05-01   5   7   8   NaN
2016-06-01   4   3   2   2

I took a look at this question , but it is not quite the same, since I want one value to be discarded based on kind of a "rule".

But that question gives the insight that tere might be some use of the pandas.pivot_table function to help acomplish it. But I couldn't find it.

Thanks !

I would sort by PublishDate , groupby what you want using the last aggregate function. It will return the last row (latest publish date for you) for each 'Name'.

df = df.sort_index(by='PublishDate').copy()
df.reset_index(inplace=True)
grouped_df = df.groupby('Name').last()

Based on @dmb 's answer I came up with the following way of coping with the issues that I've stated on the comments:

df = df.sort_values(by='PublishDate').copy()
df = df.pivot_table(index = 'Date',columns = 'Stock',values = 'Value', aggfunc = 'last')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM