[英]Create new variables from row for each existing variable in pandas dataframe
I have a dataframe which look like: 我有一个数据框,看起来像:
0 target_year ID v1 v2
1 2000 1 0.3 1
2 2000 2 1.2 4
...
10 2001 1 3 2
11 2001 2 2 2
An I would like the following output: 我想要以下输出:
0 ID v1_1 v2_1 v1_2 v2_2
1 1 0.3 1 3 2
2 2 1.2 4 2 2
Do you have any idea how to do that? 你知道怎么做吗?
You could use pd.pivot_table
, using the GroupBy.cumcount
of ID
as columns. 你可以使用pd.pivot_table
,使用GroupBy.cumcount
的ID
为列。
Then we can use a list comprehension with f-strings
to merge the MultiIndex
header into a sinlge level: 然后我们可以使用带有f-strings
的列表MultiIndex
来将MultiIndex
头合并到一个sinlge级别:
cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
index = df.ID,
columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]
v1_1 v1_2 v2_1 v2_2
ID
1 0.3 3.0 1 2
2 1.2 2.0 4 2
Use GroupBy.cumcount
for counter column, reshape by DataFrame.set_index
with DataFrame.unstack
and last flatten in list comprehension and f-string
s: 使用GroupBy.cumcount
作为计数器列,通过DataFrame.set_index
使用DataFrame.unstack
重新DataFrame.unstack
,最后在列表理解和f-string
展平:
g = df.groupby('ID').ID.cumcount() + 1
df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID v1_1 v1_2 v2_1 v2_2
0 1 0.3 3.0 1 2
1 2 1.2 2.0 4 2
If your data come in only two years, you can also merge
: 如果您的数据仅在两年内出现,您还可以merge
:
cols = ['ID','v1', 'v2']
df[df.target_year.eq(2000)][cols].merge(df[df.target_year.eq(2001)][cols],
on='ID',
suffixes=['_1','_2'])
Output 产量
ID v1_1 v2_1 v1_2 v2_2
0 1 0.3 1 3.0 2
1 2 1.2 4 2.0 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.