简体   繁体   English

Pandas: Pivot 行进列

[英]Pandas : Pivot row into column

The following is a minimal example of my data:以下是我的数据的一个最小示例:

   Id name class_cd class_name
0   1    A     abc1        dog
1   1    A     def2     canine
2   1    A     ghi1       safe
3   2    B     abc1        cat
4   2    B     def2      tabby

Can be reproduced with:可以重现:

df = pd.DataFrame({
    'Id': [1, 1, 1, 2, 2],
    'name':['A', 'A', 'A', 'B', 'B'],
    'class_cd': ['abc1', 'def2', 'ghi1', 'abc1', 'def2'],
    'class_name': ['dog', 'canine', 'safe', 'cat', 'tabby']
})

I want the class_cd distinct values to become new columns, where the value is the associated class_name , such that the result contains one row for each id .我希望class_cd不同的值成为新列,其中值是关联的class_name ,这样结果每个id包含一行。

Expected outcome:预期结果:

    Id  name    abc1    def2    ghi1
0   1      A     dog  canine    safe
1   2      B     cat   tabby    

How could one achieve this with Pandas?如何用 Pandas 实现这一目标?

You can try:你可以试试:

(df.pivot(index=['Id', 'name'], columns='class_cd', values='class_name')
 .fillna('')
 .reset_index())

class_cd  Id name abc1    def2  ghi1
0          1    A  dog  canine  safe
1          2    B  cat   tabby   

This is a job for pivot .这是pivot的工作。

You tell it which columns you want to expand, and what values to put in those new columns.您告诉它要扩展哪些列,以及要在这些新列中放入哪些值。 It will use unique values from the specified index to create the rows in the result.它将使用指定索引中的唯一值在结果中创建行。

>>> df.pivot(index=['Id','name'], columns='class_cd', values='class_name')
class_cd abc1    def2  ghi1
Id name
1  A      dog  canine  safe
2  B      cat   tabby   NaN

Then, you can call reset_index() to flatten the multi-index into columns.然后,您可以调用reset_index()将多索引展平为列。

class_cd  Id name abc1    def2  ghi1
0          1    A  dog  canine  safe
1          2    B  cat   tabby   NaN

As an altervative using crosstab :作为替代使用crosstab

dfx=pd.crosstab([df['Id'],df['name']], df['class_cd'],values=df['class_name'],aggfunc=','.join)

Output : Output :

          abc1    def2  ghi1
Id name                    
1  A      dog  canine  safe
2  B      cat   tabby   NaN

I would like to do it by pandas and sql 1.import sql我想通过 pandas 和 sql 1.导入 sql

!pip install pandasql

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

take out dataframe from class_cd从 class_cd 中取出 dataframe

df1=df[df['class_cd']=='abc1']
df2=df[df['class_cd']=='def2']
df3=df[df['class_cd']=='ghi3']

  1. Use sql to join three tables使用sql连接三张表
query="""
select tt1.Id, tt1.name, tt1.abc1,tt1.def2, t3.class_name as 'ghi3'
from
(select t1.Id,t1.name,t1.class_name as 'abc1', t2.class_name as 'def2'
from df1 as t1 
join df2 as t2 
on t1.name=t2.name) as tt1

left join df3 as t3
on tt1.name = t3.name 

"""

4.outcome 4.结果

df_result=pysqldf(query)
print(df_result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM