[英]Pandas : Pivot row into column
The following is a minimal example of my data:以下是我的数据的一个最小示例:
Id name class_cd class_name
0 1 A abc1 dog
1 1 A def2 canine
2 1 A ghi1 safe
3 2 B abc1 cat
4 2 B def2 tabby
Can be reproduced with:可以重现:
df = pd.DataFrame({
'Id': [1, 1, 1, 2, 2],
'name':['A', 'A', 'A', 'B', 'B'],
'class_cd': ['abc1', 'def2', 'ghi1', 'abc1', 'def2'],
'class_name': ['dog', 'canine', 'safe', 'cat', 'tabby']
})
I want the class_cd
distinct values to become new columns, where the value is the associated class_name
, such that the result contains one row for each id
.我希望class_cd
不同的值成为新列,其中值是关联的class_name
,这样结果每个id
包含一行。
Expected outcome:预期结果:
Id name abc1 def2 ghi1
0 1 A dog canine safe
1 2 B cat tabby
How could one achieve this with Pandas?如何用 Pandas 实现这一目标?
You can try:你可以试试:
(df.pivot(index=['Id', 'name'], columns='class_cd', values='class_name')
.fillna('')
.reset_index())
class_cd Id name abc1 def2 ghi1
0 1 A dog canine safe
1 2 B cat tabby
This is a job for pivot
.这是pivot
的工作。
You tell it which columns you want to expand, and what values to put in those new columns.您告诉它要扩展哪些列,以及要在这些新列中放入哪些值。 It will use unique values from the specified index to create the rows in the result.它将使用指定索引中的唯一值在结果中创建行。
>>> df.pivot(index=['Id','name'], columns='class_cd', values='class_name')
class_cd abc1 def2 ghi1
Id name
1 A dog canine safe
2 B cat tabby NaN
Then, you can call reset_index()
to flatten the multi-index into columns.然后,您可以调用reset_index()
将多索引展平为列。
class_cd Id name abc1 def2 ghi1
0 1 A dog canine safe
1 2 B cat tabby NaN
I would like to do it by pandas and sql 1.import sql我想通过 pandas 和 sql 1.导入 sql
!pip install pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
take out dataframe from class_cd从 class_cd 中取出 dataframe
df1=df[df['class_cd']=='abc1']
df2=df[df['class_cd']=='def2']
df3=df[df['class_cd']=='ghi3']
query="""
select tt1.Id, tt1.name, tt1.abc1,tt1.def2, t3.class_name as 'ghi3'
from
(select t1.Id,t1.name,t1.class_name as 'abc1', t2.class_name as 'def2'
from df1 as t1
join df2 as t2
on t1.name=t2.name) as tt1
left join df3 as t3
on tt1.name = t3.name
"""
4.outcome 4.结果
df_result=pysqldf(query)
print(df_result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.