繁体   English   中英

如何转换 dataframe 列

[英]How to transform dataframe columns

我正在尝试转换从外部 API 提取的数据。 到目前为止,我的 dataframe 看起来像这样:

Country          Date       Team    Rating
United Kingdom  11/8/2019   Team A  95
United Kingdom  2/20/2019   Team B  90
United Kingdom  9/22/2017   Team A  90
United Kingdom  6/28/2016   Team B  90
United Kingdom  6/27/2016   Team C  90
United Kingdom  6/24/2016   Team A  95
United Kingdom  6/12/2015   Team C  100
United Kingdom  6/13/2014   Team C  100
United Kingdom  4/19/2013   Team B  95
United Kingdom  2/22/2013   Team A  95
United Kingdom  12/13/2012  Team C  100
United Kingdom  3/14/2012   Team B  100
United Kingdom  2/13/2012   Team A  100
United Kingdom  10/26/2010  Team C  100
United Kingdom  5/21/2009   Team C  100
United Kingdom  9/21/2000   Team B  100
United Kingdom  9/21/2000   Team B  100
United Kingdom  8/10/1994   Team B  100
United Kingdom  6/26/1989   Team C  100
United Kingdom  4/28/1978   Team C  100
United Kingdom  3/31/1978   Team A  100

我希望它看起来像这样,但我正在努力弄清楚如何(我还是数据框的新手):

Country         Date    Team A  Team B  Team C
United Kingdom  11/8/2019   95  90  90
United Kingdom  2/20/2019   90  90  90
United Kingdom  9/22/2017   90  90  90
United Kingdom  6/28/2016   95  90  90
United Kingdom  6/27/2016   95  95  90
United Kingdom  6/24/2016   95  95  100
United Kingdom  6/12/2015   95  95  100
United Kingdom  6/13/2014   95  95  100
United Kingdom  4/19/2013   95  95  100
United Kingdom  2/22/2013   95  100 100
United Kingdom  12/13/2012  100 100 100
United Kingdom  3/14/2012   100 100 100
United Kingdom  2/13/2012   100 100 100
United Kingdom  10/26/2010  100 100 100
United Kingdom  5/21/2009   100 100 100
United Kingdom  9/21/2000   100 100 100
United Kingdom  9/21/2000   100 100 100
United Kingdom  8/10/1994   100 100 100
United Kingdom  6/26/1989   100 100 100
United Kingdom  4/28/1978   100 100 100
United Kingdom  3/31/1978   100 100 100

所以基本上我希望国家和日期列保持不变,但是与每行只有一个团队相反,我希望所有团队都显示为列。 我希望在未更新时使用它们以前的值,而不是使用空白值。

例如,对于 2019 年 11 月 8 日,您可以在我的原始 df 中看到只有 A 队的评分发生了变化。 对于团队 B 和团队 C 列,如果没有更新,我希望他们使用之前的值。

有没有人有什么建议?

首先,如果您需要对日期时间进行排序,我建议使用日期的YYYYMMDD字符串表示形式(例如,第一条记录为20191108 )或使用实际的datetime时间数据类型。 使用美式表示法令人困惑且不易分类。

In any case, to solve your issue I would advise to use pandas pivot function first, followed by a fill NaN (ie fillna ) function with a backfill (ie bfill ) method.

编辑:如果您想保留Country列,似乎将其用作Date列的多索引不适用于pivot 您可以做的是保留原始df并将其与Date列上的新 df 加入。

import pandas as pd
import datetime as dt    

# Create DataFrame similar to example
df = pd.DataFrame(data={'Date': ['11/8/2019','2/20/2019','9/22/2017','6/28/2016','6/27/2016','6/24/2016','6/12/2015','6/13/2014'], 
                        'Team': ['Team A','Team B','Team A','Team B','Team C','Team A','Team C','Team C'], 
                        'Rating': [95,90,90,90,90,95,100,100]})


# Convert strings to datetimes
df['Date'] = df['Date'].map(lambda x: dt.datetime.strptime(x, '%m/%d/%Y'))
df['Country'] = 'United Kingdom'

# Pivot DataFrame
dfp = df.pivot(columns='Team', values='Rating')

# Join with Country from original df
dfp = df[['Date', 'Country']].join(dfp)

# sort descending on Date
dfp.sort_values(by='Date', ascending=False, inplace=True)

# dfp is:
# Date        Country         Team A  Team B  Team C
# 2019-11-08  United Kingdom  95.0     NaN     NaN
# 2019-02-20  United Kingdom   NaN    90.0     NaN
# 2017-09-22  United Kingdom  90.0     NaN     NaN
# ...

# Fill NaN values using the "next" row value
dfp.fillna(method='bfill', inplace=True)

# dfp is:
# Date        Country         Team A  Team B  Team C                              
# 2019-11-08  United Kingdom    95.0    90.0    90.0
# 2019-02-20  United Kingdom    90.0    90.0    90.0
# 2017-09-22  United Kingdom    90.0    90.0    90.0
# ...

基本上,您需要的是:

data.pivot_table(index=['Country', 'Date'], columns='Team', values='Rating').reset_index()\
    .sort_values(['Country', 'Date'], ascending=False).fillna(method='bfill', axis=0)

它将创建一个pivot_table ,以您拥有的不规则顺序对值进行排序,并提取缺失的最后一个现有值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM