简体   繁体   English

为pandas dataframe中的每个现有变量从行创建新变量

[英]Create new variables from row for each existing variable in pandas dataframe

I have a dataframe which look like: 我有一个数据框,看起来像:

0  target_year ID   v1  v2  
1  2000         1  0.3   1
2  2000         2  1.2   4
...
10 2001         1    3   2
11 2001         2    2   2

An I would like the following output: 我想要以下输出:

0   ID   v1_1  v2_1  v1_2  v2_2  
1    1    0.3     1     3     2 
2    2    1.2     4     2     2

Do you have any idea how to do that? 你知道怎么做吗?

You could use pd.pivot_table , using the GroupBy.cumcount of ID as columns. 你可以使用pd.pivot_table ,使用GroupBy.cumcountID为列。

Then we can use a list comprehension with f-strings to merge the MultiIndex header into a sinlge level: 然后我们可以使用带有f-strings的列表MultiIndex来将MultiIndex头合并到一个sinlge级别:

cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
                         index = df.ID, 
                         columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]


     v1_1  v1_2  v2_1  v2_2
ID                        
1    0.3   3.0     1     2
2    1.2   2.0     4     2

Use GroupBy.cumcount for counter column, reshape by DataFrame.set_index with DataFrame.unstack and last flatten in list comprehension and f-string s: 使用GroupBy.cumcount作为计数器列,通过DataFrame.set_index使用DataFrame.unstack重新DataFrame.unstack ,最后在列表理解和f-string展平:

g = df.groupby('ID').ID.cumcount() + 1

df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
   ID  v1_1  v1_2  v2_1  v2_2
0   1   0.3   3.0     1     2
1   2   1.2   2.0     4     2

If your data come in only two years, you can also merge : 如果您的数据仅在两年内出现,您还可以merge

cols = ['ID','v1', 'v2']
df[df.target_year.eq(2000)][cols].merge(df[df.target_year.eq(2001)][cols],
                                 on='ID',
                                 suffixes=['_1','_2'])

Output 产量

    ID  v1_1    v2_1    v1_2    v2_2
0   1   0.3     1       3.0     2
1   2   1.2     4       2.0     2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用来自另一个数据帧的行号,从现有数据帧创建新的 Pandas 数据帧 - Create new pandas dataframe from existing dataframe, using row numbers from another dataframe Python pandas 通过组合变量和每行的变量列表并附加 - Python pandas create a new row by combining a variable and a list of variables for each row and appending Python / Pandas:如何使用从现有数据框计算出的新变量和值创建结果表 - Python/Pandas: How to create a table of results with new variables and values calculated from an existing dataframe 如何从 pandas dataframe 中的当前行中减去前一行以创建一个新列,以每个名称重新启动进程? - How to subtract previous row from current row in a pandas dataframe to create a new column restarting the process with each name? 从现有数据框的行的子集创建新的pandas数据框 - Create a new pandas dataframe from a subset of rows from an existing dataframe Python Pandas从现有数据框的所有行组合创建一个新的数据框 - Python Pandas create a new dataframe from all row combinations of existing dataframes 将每个 pandas 行与列表字典和 append 新变量与 dataframe 进行比较 - Compare each pandas row to a dictionary of list and append new variable to the dataframe 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns 有没有办法向pandas数据框添加新列,将新列的每个唯一值附加到数据帧的每个现有行? - Is there a way to add a new column to a pandas dataframe, appending each unique value of the new column to every existing row of the dataframe? 为每一行运行一个函数并创建一个新的 Column Pandas Dataframe - Run a function for each row and create a new Column Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM