根据 pandas 中另一列中的重复 ID 将行转换为宽列

Question

My question is similar to this , this , and this questions.我的问题类似于this 、 this和this问题。

But still cannot resolve it.但仍然无法解决。

I have a dataframe with duplicated ids我有一个带有重复 ID 的 dataframe

ID  Publication_type
1   Journal          
1   Clinical study   
1   Guideline        
2   Journal          
2   Letter

I want to make it wide, but I do not know how many publication type will I have - maybe 2, maybe 20. Thus, I do not know how many columns wide will I need.我想让它变宽，但我不知道我会有多少种出版物类型——也许是 2，也许是 20。因此，我不知道我需要多少列宽。 The max size of wide columns for publication_type must be not be more than the number of types for each id. publication_type的宽列的最大大小不得超过每个 id 的类型数。

Expected output预期 output

 ID Publication_type1 Publication_type2 Publication_type 3    etc
 1  Journal           Clinical Study    Guideline
 2  Journal           Letter            NaN

For now I do not need to put the same publication type into the same column.现在我不需要将相同的发布类型放入同一列。 I do not need all articles in the same column.我不需要同一列中的所有文章。 Thanks!谢谢！

Answer 1

You can group by ID , aggregate via list , and then create a new DataFrame from the results:您可以按ID分组，通过list聚合，然后从结果中创建一个新的 DataFrame：

col = 'Publication_type'
new_df = pd.DataFrame(df.groupby('ID')[col].agg(lambda x: x.tolist()).tolist()).replace({None: np.nan})
new_df.columns = [f'{col}{i}' for i in new_df.columns + 1]
new_df['ID'] = df['ID'].drop_duplicates().reset_index(drop=True)

Output: Output：

>>> df
  Publication_type1 Publication_type2 Publication_type3  ID
0           Journal    Clinical-study         Guideline   1
1           Journal            Letter               NaN   2

根据 pandas 中另一列中的重复 ID 将行转换为宽列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-12-22 18:11:32

根据 pandas 中另一列中的重复 ID 将行转换为宽列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-12-22 18:11:32

解决方案1
1 已采纳 2021-12-22 18:11:32