[英]Reshaping table in pandas
I want to reshape a table in pandas.我想重塑 pandas 中的表格。 I have a table of the form:
我有一张表格:
date | country |state | population | num_cars
1 | c1 | s1 | 1 | 1
2 | c1 | s1 | 1 | 1
1 | c1 | s2 | 1 | 1
.
2 | c2 | s2 | 1 | 2
2 | c2 | s2 | 1 | 2
I want to turn it to this shape:我想把它变成这个形状:
date |1_population | c1_s1_population | c1_s2_population...| c2_s1_populationc1_num_cars |c2_11_num_cars...
To explain, the initial data has pop and numbers by country, state for a date range.为了解释,初始数据具有按国家/地区划分的流行和数字,state 用于日期范围。 Now I want to convert into a number of columns of time series for each level (country, country-state)
现在我想为每个级别(国家、国家/地区)转换成多列时间序列
How do I do this?我该怎么做呢?
As the source data sample, I used a DataFrame with 2 hypothetical countries, 3 states each:作为源数据样本,我使用了 DataFrame 和 2 个假设国家,每个国家 3 个州:
date country state population num_cars
0 1990 Xxx Aaa 100 15
1 2010 Xxx Aaa 120 18
2 1990 Xxx Bbb 80 9
3 2010 Xxx Bbb 88 11
4 1990 Xxx Ccc 75 6
5 2010 Xxx Ccc 82 8
6 1990 Yyy Ggg 40 5
7 2010 Yyy Ggg 50 6
8 1990 Yyy Hhh 30 3
9 2010 Yyy Hhh 38 4
10 1990 Yyy Jjj 29 3
11 2010 Yyy Jjj 35 4
To solve your problem, start with defining a reformatting function:要解决您的问题,请从定义重新格式化 function 开始:
def reformat(grp, col):
pop = grp[col]
pop.name = grp.date.iloc[0]
return pop
From a group of rows ( grp ) it takes a column of particular name ( col ), sets the name as date from the first row (the grouping key) and returns it.从一组行( grp )中获取特定名称( col )的列,将名称设置为第一行(分组键)的日期并返回它。
As the initial step, group df by country and state :作为第一步,按国家和state对df进行分组:
gr = df.set_index(['country', 'state']).groupby('date')
Then compute 2 DataFrames, as the result of reformatting (applying the above function to each group, for both columns of interest:然后计算 2 个 DataFrame,作为重新格式化的结果(将上述 function 应用于每个组,对于两个感兴趣的列:
df1 = gr.apply(reformat, col='population')
df2 = gr.apply(reformat, col='num_cars')
And having two partial results, merge them on indices:并有两个部分结果,将它们合并到索引上:
pd.merge(df1, df2, left_index=True, right_index=True,
suffixes=('_pop', '_cars'))
The result is:结果是:
country Xxx_pop Yyy_pop Xxx_cars Yyy_cars
state Aaa Bbb Ccc Ggg Hhh Jjj Aaa Bbb Ccc Ggg Hhh Jjj
date
1990 100 80 75 40 30 29 15 9 6 5 3 3
2010 120 88 82 50 38 35 18 11 8 6 4 4
As you can see, the top level of MultiIndex on columns is "Country / population" and "Country / car No".如您所见,列上 MultiIndex 的顶层是“国家/人口”和“国家/汽车编号”。 The other level contains state names.
另一个级别包含 state 名称。
To trace how tis solution works, execute each step separately and inspect its result.要跟踪 tis 解决方案的工作原理,请分别执行每个步骤并检查其结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.