简体   繁体   English

Python Pandas 重塑数据框

[英]Python Pandas Reshape Data Frame

It seems to be very basic knowledge, but I got stuck despite having some theoretical background in data processing (via other software).这似乎是非常基础的知识,但尽管在数据处理方面有一些理论背景(通过其他软件),我还是被卡住了。 Worth to mention I'm new to python and pandas library.值得一提的是,我是 python 和 pandas 库的新手。

So.所以。 I've got a data frame:我有一个数据框:截屏

My task is to put values of 'Series Name' column as separate columns (transform from long to wide).我的任务是将“系列名称”列的值作为单独的列(从长到宽转换)。 I've spent ages trying different methods, but got only errors.我花了很长时间尝试不同的方法,但只有错误。

For example:例如:

mydata = mydata.pivot(index=['Country', 'Year'], columns='Series Name', values='Value')

And I got an error:我得到了一个错误:

... a lot of text... ValueError: Length of passed values is 2487175, index implies 2 ...大量文本... ValueError:传递值的长度为 2487175,索引意味着 2

Could anybody guide me through that process please?有人可以指导我完成这个过程吗? Thanks.谢谢。

It's for the code 'mydata = mydata.pivot(index=['Country', 'Year'], columns='Series Name', values='Value')' Error message:它用于代码 'mydata = mydata.pivot(index=['Country', 'Year'], columns='Series Name', values='Value')' 错误消息:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-8169d6d374c7> in <module>
----> 1 mydata = mydata.pivot(index=['Country', 'Year'], columns='Series Name', values='Value')

~/anaconda3_501/lib/python3.6/site-packages/pandas/core/frame.py in pivot(self, index, columns, values)
   5192         """
   5193         from pandas.core.reshape.reshape import pivot
-> 5194         return pivot(self, index=index, columns=columns, values=values)
   5195 
   5196     _shared_docs['pivot_table'] = """

~/anaconda3_501/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in pivot(self, index, columns, values)
    412         else:
    413             indexed = self._constructor_sliced(self[values].values,
--> 414                                                index=index)
    415     return indexed.unstack(columns)
    416 

~/anaconda3_501/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260                             'Length of passed values is {val}, '
    261                             'index implies {ind}'
--> 262                             .format(val=len(data), ind=len(index)))
    263                 except TypeError:
    264                     pass

ValueError: Length of passed values is 2487175, index implies 2

Try maybe:试试也许:

mydata = mydata.pivot_table(index=['Country', 'Year'], columns='Series Name', values='Value', aggfunc='sum')

(If you want to sum your Value ) it seems that you need to somehow aggregate your data explicitly. (如果你想总结你的Value )似乎你需要以某种方式明确地聚合你的数据。 Although would be good, if you would share full error message.虽然会很好,但如果你能分享完整的错误信息。

I managed to reproduce your error.我设法重现了您的错误。 Like I said- you need to provide aggregating function:就像我说的-您需要提供聚合功能:

import pandas as pd

df=pd.DataFrame({"a": list("xyzpqr"), "b": list("abbbaa"), "c": [4,3,6,2,7,5], "d": list("pqqppp")})

df2=df.pivot(index=["b", "d"], columns="a", values="c")
#ValueError: Length of passed values is 6, index implies 2

df2=df.pivot_table(index=["b", "d"], columns="a", values="c", aggfunc=set)
#works fine - you need aggregation function e.g. list/set to collect all/unique values or e.g. sum/max to do some numeric operation

Nearly there.就快到了。 The resulting table is结果表是在此处输入图片说明

How is it possible to to put 'Country' and 'Year' to the same level as other column names to be able to export it normally to excel?如何将“国家”和“年份”与其他列名称放在同一级别才能正常导出到excel? If I export like it is now 'Country' and 'Year' not included in the table.如果我像现在这样导出,表格中不包含“国家”和“年份”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM