简体   繁体   English

pandas 中的整形表

[英]Reshaping table in pandas

I want to reshape a table in pandas.我想重塑 pandas 中的表格。 I have a table of the form:我有一张表格:

date | country |state | population | num_cars
1    | c1      | s1   | 1          | 1
2    | c1      | s1   | 1          | 1
1    | c1      | s2   | 1          | 1
.
2    | c2      | s2   | 1          | 2
2    | c2      | s2   | 1          | 2

I want to turn it to this shape:我想把它变成这个形状:

date |1_population | c1_s1_population | c1_s2_population...| c2_s1_populationc1_num_cars |c2_11_num_cars...

To explain, the initial data has pop and numbers by country, state for a date range.为了解释,初始数据具有按国家/地区划分的流行和数字,state 用于日期范围。 Now I want to convert into a number of columns of time series for each level (country, country-state)现在我想为每个级别(国家、国家/地区)转换成多列时间序列

How do I do this?我该怎么做呢?

As the source data sample, I used a DataFrame with 2 hypothetical countries, 3 states each:作为源数据样本,我使用了 DataFrame 和 2 个假设国家,每个国家 3 个州:

    date country state  population  num_cars
0   1990     Xxx   Aaa         100        15
1   2010     Xxx   Aaa         120        18
2   1990     Xxx   Bbb          80         9
3   2010     Xxx   Bbb          88        11
4   1990     Xxx   Ccc          75         6
5   2010     Xxx   Ccc          82         8
6   1990     Yyy   Ggg          40         5
7   2010     Yyy   Ggg          50         6
8   1990     Yyy   Hhh          30         3
9   2010     Yyy   Hhh          38         4
10  1990     Yyy   Jjj          29         3
11  2010     Yyy   Jjj          35         4

To solve your problem, start with defining a reformatting function:要解决您的问题,请从定义重新格式化 function 开始:

def reformat(grp, col):
    pop = grp[col]
    pop.name = grp.date.iloc[0]
    return pop

From a group of rows ( grp ) it takes a column of particular name ( col ), sets the name as date from the first row (the grouping key) and returns it.从一组行( grp )中获取特定名称( col )的列,将名称设置为第一行(分组键)的日期并返回它。

As the initial step, group df by country and state :作为第一步,按国家statedf进行分组:

gr = df.set_index(['country', 'state']).groupby('date')

Then compute 2 DataFrames, as the result of reformatting (applying the above function to each group, for both columns of interest:然后计算 2 个 DataFrame,作为重新格式化的结果(将上述 function 应用于每个组,对于两个感兴趣的列:

df1 = gr.apply(reformat, col='population')
df2 = gr.apply(reformat, col='num_cars')

And having two partial results, merge them on indices:并有两个部分结果,将它们合并到索引上:

pd.merge(df1, df2, left_index=True, right_index=True,
    suffixes=('_pop', '_cars'))

The result is:结果是:

country Xxx_pop         Yyy_pop         Xxx_cars         Yyy_cars        
state       Aaa Bbb Ccc     Ggg Hhh Jjj      Aaa Bbb Ccc      Ggg Hhh Jjj
date                                                                     
1990        100  80  75      40  30  29       15   9   6        5   3   3
2010        120  88  82      50  38  35       18  11   8        6   4   4

As you can see, the top level of MultiIndex on columns is "Country / population" and "Country / car No".如您所见,列上 MultiIndex 的顶层是“国家/人口”和“国家/汽车编号”。 The other level contains state names.另一个级别包含 state 名称。

To trace how tis solution works, execute each step separately and inspect its result.要跟踪 tis 解决方案的工作原理,请分别执行每个步骤并检查其结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM