簡體   English   中英

Python/Pandas 轉置

[英]Python/Pandas transpose

我有以下格式的數據,其中包含不同月份的多個度量列,如下所示。

Cust_No Measure1_month1 Measure1_month2 .... Measure1_month72  Measure2_month_1 Measure2_month_2....so on 
1       10             20             .... 500              40               50 
2       20             40             .... 800              70               150             ....    

我想實現以下兩種格式。 格式 1)

+-------------+----------+---------+-------+
| CustNum     | Measure  |   Value | Month |
+-------------+----------+---------+-------+
| 1           | Measure1 | 10      | 1     |
| 1           | Measure1 | 20      | 2     |
| 1           | Measure1 | 30      | 3     |
| 1           | Measure1 | 70      | 4     |
| 1           | Measure1 | 40      | 5     |
| .           | .        | .       | .     |
| .           | .        | .       | .     |
| 1           | Measure1 | 700     | 72    |
| 1           | Measure2 | 30      | 1     |
| 1           | Measure2 | 40      | 2     |
| 1           | Measure2 | 80      | 3     |
| 1           | Measure2 | 90      | 4     |
| 1           | Measure2 | 100     | 5     |
| .           | .        | .       | .     |
| .           | .        | .       | .     |
| .           | .        | .       | .     |
| 1           | Measure2 | 50      | 72    |
+-------------+----------+---------+-------+

依此類推每個客戶編號

格式2:

+---------+---------+----------+----------+
| CustNum |   Month | Measure1 | Measure2 |
+---------+---------+----------+----------+
| 1       | 1       | 10       | 30       |
| 1       | 2       | 20       | 40       |
| 1       | 3       | 30       | 80       |
| 1       | 4       | 70       | 90       |
| 1       | 5       | 40       | 100      |
| .       | .       | .        | .        |
| .       | .       | .        | .        |
| 1       | 72      | 700      | 50       |
+---------+---------+----------+----------+

依此類推每個客戶編號

你能幫我解決這個問題嗎?

謝謝

設置

dct = {'Cust_No': {0: 1, 1: 2},
 'Measure1_month1': {0: 10, 1: 20},
 'Measure1_month2': {0: 20, 1: 40},
 'Measure1_month72': {0: 500, 1: 800},
 'Measure2_month_1': {0: 40, 1: 70},
 'Measure2_month_2': {0: 50, 1: 150}}

df = pd.DataFrame(dct)

很多爭論,但一般來說:將您的列拆分為 MultiIndex,然后堆疊。 您想要的第二個格式是第一個格式的支點。


d = df.set_index('Cust_No')
d.columns = d.columns.str.replace('month\_', 'month').str.split('_', expand=True)

u = d.stack((0, 1)).rename_axis(
      ['Cust_No', 'Measure', 'Month']).to_frame('Value').reset_index()

f1 = u.assign(Month=u.Month.str.extract(r'(\d+)')[0])

f2 = f1.pivot_table(
       index=['Cust_No', 'Month'], columns='Measure', values='Value', fill_value=0)

輸出

>>> f1                                                   
   Cust_No   Measure Month  Value  
0        1  Measure1     1   10.0  
1        1  Measure1     2   20.0  
2        1  Measure1    72  500.0  
3        1  Measure2     1   40.0  
4        1  Measure2     2   50.0  
5        2  Measure1     1   20.0  
6        2  Measure1     2   40.0  
7        2  Measure1    72  800.0  
8        2  Measure2     1   70.0  
9        2  Measure2     2  150.0  

>>> f2                                               
Measure        Measure1  Measure2  
Cust_No Month                      
1       1            10        40  
        2            20        50  
        72          500         0  
2       1            20        70  
        2            40       150  
        72          800         0  

給定輸入數據幀,df 為:

np.random.seed(123)
df = pd.DataFrame(np.random.randint(20,500,(2,144)), 
             columns = pd.MultiIndex.from_product([['Measure1','Measure2'], [f'Month{i}' for i in range(1,73)]]),
             index=[1,2]).rename_axis('Cust_no').reset_index()
df.columns = df.columns.map('_'.join).str.strip('_')
df

輸出:

   Cust_no  Measure1_Month1  Measure1_Month2  ...  Measure2_Month70  Measure2_Month71  Measure2_Month72
0        1              385              402  ...               153               380               129
1        2              106               66  ...               363               361               173

[2 rows x 145 columns]

格式一:

df = df.set_index('Cust_no')
df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('_')), names=['Measure', 'Month'])
df_format1 = df.stack([0,1]).rename('Value').reset_index()
df_format1['Month'] = df_format1['Month'].str.extract('(\d+)')
df_format1

輸出:

    Cust_no   Measure Month  Value
0          1  Measure1     1    385
1          1  Measure1    10    143
2          1  Measure1    11     77
3          1  Measure1    12    234
4          1  Measure1    13    245
..       ...       ...   ...    ...
283        2  Measure2    70    363
284        2  Measure2    71    361
285        2  Measure2    72    173
286        2  Measure2     8     65
287        2  Measure2     9    461

[288 rows x 4 columns]

格式2:

df_format2 = (df_format1.set_index(['Cust_no','Month','Measure'])['Value']
                        .unstack().reset_index().rename_axis(None, axis=1))
df_format2

輸出:

     Cust_no Month  Measure1  Measure2
0          1     1       385        90
1          1    10       143       379
2          1    11        77       479
3          1    12       234       458
4          1    13       245       475
..       ...   ...       ...       ...
139        2    70       108       363
140        2    71       258       361
141        2    72       235       173
142        2     8       453        65
143        2     9       276       461

[144 rows x 4 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM