[英]How to transform dataframe columns
我正在尝试转换从外部 API 提取的数据。 到目前为止,我的 dataframe 看起来像这样:
Country Date Team Rating
United Kingdom 11/8/2019 Team A 95
United Kingdom 2/20/2019 Team B 90
United Kingdom 9/22/2017 Team A 90
United Kingdom 6/28/2016 Team B 90
United Kingdom 6/27/2016 Team C 90
United Kingdom 6/24/2016 Team A 95
United Kingdom 6/12/2015 Team C 100
United Kingdom 6/13/2014 Team C 100
United Kingdom 4/19/2013 Team B 95
United Kingdom 2/22/2013 Team A 95
United Kingdom 12/13/2012 Team C 100
United Kingdom 3/14/2012 Team B 100
United Kingdom 2/13/2012 Team A 100
United Kingdom 10/26/2010 Team C 100
United Kingdom 5/21/2009 Team C 100
United Kingdom 9/21/2000 Team B 100
United Kingdom 9/21/2000 Team B 100
United Kingdom 8/10/1994 Team B 100
United Kingdom 6/26/1989 Team C 100
United Kingdom 4/28/1978 Team C 100
United Kingdom 3/31/1978 Team A 100
我希望它看起来像这样,但我正在努力弄清楚如何(我还是数据框的新手):
Country Date Team A Team B Team C
United Kingdom 11/8/2019 95 90 90
United Kingdom 2/20/2019 90 90 90
United Kingdom 9/22/2017 90 90 90
United Kingdom 6/28/2016 95 90 90
United Kingdom 6/27/2016 95 95 90
United Kingdom 6/24/2016 95 95 100
United Kingdom 6/12/2015 95 95 100
United Kingdom 6/13/2014 95 95 100
United Kingdom 4/19/2013 95 95 100
United Kingdom 2/22/2013 95 100 100
United Kingdom 12/13/2012 100 100 100
United Kingdom 3/14/2012 100 100 100
United Kingdom 2/13/2012 100 100 100
United Kingdom 10/26/2010 100 100 100
United Kingdom 5/21/2009 100 100 100
United Kingdom 9/21/2000 100 100 100
United Kingdom 9/21/2000 100 100 100
United Kingdom 8/10/1994 100 100 100
United Kingdom 6/26/1989 100 100 100
United Kingdom 4/28/1978 100 100 100
United Kingdom 3/31/1978 100 100 100
所以基本上我希望国家和日期列保持不变,但是与每行只有一个团队相反,我希望所有团队都显示为列。 我希望在未更新时使用它们以前的值,而不是使用空白值。
例如,对于 2019 年 11 月 8 日,您可以在我的原始 df 中看到只有 A 队的评分发生了变化。 对于团队 B 和团队 C 列,如果没有更新,我希望他们使用之前的值。
有没有人有什么建议?
首先,如果您需要对日期时间进行排序,我建议使用日期的YYYYMMDD
字符串表示形式(例如,第一条记录为20191108
)或使用实际的datetime
时间数据类型。 使用美式表示法令人困惑且不易分类。
In any case, to solve your issue I would advise to use pandas pivot
function first, followed by a fill NaN (ie fillna
) function with a backfill (ie bfill
) method.
编辑:如果您想保留Country
列,似乎将其用作Date
列的多索引不适用于pivot
。 您可以做的是保留原始df
并将其与Date
列上的新 df 加入。
import pandas as pd
import datetime as dt
# Create DataFrame similar to example
df = pd.DataFrame(data={'Date': ['11/8/2019','2/20/2019','9/22/2017','6/28/2016','6/27/2016','6/24/2016','6/12/2015','6/13/2014'],
'Team': ['Team A','Team B','Team A','Team B','Team C','Team A','Team C','Team C'],
'Rating': [95,90,90,90,90,95,100,100]})
# Convert strings to datetimes
df['Date'] = df['Date'].map(lambda x: dt.datetime.strptime(x, '%m/%d/%Y'))
df['Country'] = 'United Kingdom'
# Pivot DataFrame
dfp = df.pivot(columns='Team', values='Rating')
# Join with Country from original df
dfp = df[['Date', 'Country']].join(dfp)
# sort descending on Date
dfp.sort_values(by='Date', ascending=False, inplace=True)
# dfp is:
# Date Country Team A Team B Team C
# 2019-11-08 United Kingdom 95.0 NaN NaN
# 2019-02-20 United Kingdom NaN 90.0 NaN
# 2017-09-22 United Kingdom 90.0 NaN NaN
# ...
# Fill NaN values using the "next" row value
dfp.fillna(method='bfill', inplace=True)
# dfp is:
# Date Country Team A Team B Team C
# 2019-11-08 United Kingdom 95.0 90.0 90.0
# 2019-02-20 United Kingdom 90.0 90.0 90.0
# 2017-09-22 United Kingdom 90.0 90.0 90.0
# ...
基本上,您需要的是:
data.pivot_table(index=['Country', 'Date'], columns='Team', values='Rating').reset_index()\
.sort_values(['Country', 'Date'], ascending=False).fillna(method='bfill', axis=0)
它将创建一个pivot_table
,以您拥有的不规则顺序对值进行排序,并提取缺失的最后一个现有值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.