簡體   English   中英

Pandas-將具有多列的數據框重塑/轉換為值的單列

[英]Pandas - Reshape / Transform Dataframe with Multiple Columns into a Single Column of values

我有一個熊貓數據框,其中年份作為列,國家作為行名:

Country       | 1960 | 1961 | 1962 | 1963
-----------------------------------------
United States | 1000 | 2000 | 3000 | 4000
-----------------------------------------
Argentina     | 1000 | 2000 | 3000 | 4000
-----------------------------------------

我想將其轉換為:

Country       | Year | Value
-----------------------------
Unites States | 1960 | 1000
Unites States | 1961 | 2000
Unites States | 1962 | 3000
Unites States | 1963 | 4000
Argentina     | 1960 | 1000
Argentina     | 1961 | 2000
Argentina     | 1962 | 3000
Argentina     | 1963 | 4000

我不確定要實現此目標需要執行哪些拆分,排序或分組操作。

謝謝!

您可以使用堆棧方法:

>>> df=pd.DataFrame({"country":["United States","Argentina"],
1960:[1000,1000],
1961:[2000,2000],
1962:[3000,3000],
1963:[4000,4000]} )
>>> df
   1960  1961        country  1963  1962
0  1000  2000  United States  4000  3000
1  1000  2000      Argentina  4000  3000
>>> df.set_index("country").stack()
country
United States  1960    1000
               1961    2000
               1963    4000
               1962    3000
Argentina      1960    1000
               1961    2000
               1963    4000
               1962    3000
dtype: int64
>>> df.set_index("country").stack().reset_index()
         country  level_1     0
0  United States     1960  1000
1  United States     1961  2000
2  United States     1963  4000
3  United States     1962  3000
4      Argentina     1960  1000
5      Argentina     1961  2000
6      Argentina     1963  4000
7      Argentina     1962  3000

希望對您有所幫助

僅舉一個完整的例子,

In [1]: df = pd.DataFrame([['United States', 1000, 2000, 3000, 4000],
                           ['Argentina', 1000, 2000, 3000, 4000]],
                          columns=['Country', 1960, 1961, 1962, 1963])

In [2]: df.set_index('Country', inplace=True)
In [3]: df = df.stack().reset_index()
In [4]: df.columns = ['Country', 'Year', 'Value']

產量

         Country  Year  Value
0  United States  1960   1000
1  United States  1961   2000
2  United States  1962   3000
3  United States  1963   4000
4      Argentina  1960   1000
5      Argentina  1961   2000
6      Argentina  1962   3000
7      Argentina  1963   4000

要擺脫索引列並使用“國家/地區”列作為索引,可以使用

In [3]: df = df.stack().reset_index(1)
In [4]: df.columns = ['Year', 'Value']

產生

               Year  Value
Country                   
United States  1960   1000
United States  1961   2000
United States  1962   3000
United States  1963   4000
Argentina      1960   1000
Argentina      1961   2000
Argentina      1962   3000
Argentina      1963   4000

這並不是您想要的,但是使用df.stack()可以得到以下內容:

0  Country    United States
    1960               1000
    1961               2000
    1962               3000
    1963               2300
1  Country        Argentina
    1960               1000
    1961               2000
    1962               3000
    1963               4000

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM