繁体   English   中英

我如何使用 python pandas 数据框并使用列名和行名作为新列创建一个新表

[英]how do i take a python pandas dataframe and create a new table using the column and row names as the new column

我希望有人能指出我正确的方向。 我有一个数据框,我想取第一列,将它与其余列的名称连接起来,并将值分配给这个新列。

2020-03-20DF.csv

Store,Total Started,2 Week,4 Week,5 Week,6 Week
Boston,9,0,5,1,3
New York,3,0,0,0,3
San Diego,6,0,6,0,0
Tampa Bay,1,0,1,0,0
Houston,14,0,7,0,7
Chicago,2,0,0,0,2

到目前为止我所拥有的

import pandas as pd
df1 = pd.read_csv('2020-03-20DF.csv')
df1.set_index('Store', inplace=True)
print(df1)

           Total Started  2 Week  4 Week  5 Week  6 Week
Store                                                   
Boston                 9       0       5       1       3
New York               3       0       0       0       3
San Diego              6       0       6       0       0
Tampa Bay              1       0       1       0       0
Houston               14       0       7       0       7
Chicago                2       0       0       0       2

我想看到的是

Boston-2 Week  Boston-4 Week Boston-5 Week Boston-6 Week
   0                5             1            3 

等等。

对于特定情况:

>>> df[df['Store'] == 'Boston'].filter(like='Week').add_prefix('Boston-')
   Boston-2 Week  Boston-4 Week  Boston-5 Week  Boston-6 Week
0              0              5              1              3

# generally:
>>> for store in df['Store']:
...     print(df[df['Store'] == store].filter(like='Week').add_prefix(f'{store}-'))

   Boston-2 Week  Boston-4 Week  Boston-5 Week  Boston-6 Week
0              0              5              1              3
   New York-2 Week  New York-4 Week  New York-5 Week  New York-6 Week
1                0                0                0                3
   San Diego-2 Week  San Diego-4 Week  San Diego-5 Week  San Diego-6 Week
2                 0                 6                 0                 0
   Tampa Bay-2 Week  Tampa Bay-4 Week  Tampa Bay-5 Week  Tampa Bay-6 Week
3                 0                 1                 0                 0
   Houston-2 Week  Houston-4 Week  Houston-5 Week  Houston-6 Week
4               0               7               0               7
   Chicago-2 Week  Chicago-4 Week  Chicago-5 Week  Chicago-6 Week
5               0               0               0               2

如前所述,使用了另一篇文章中的代码示例

import pandas as pd
df1 = pd.read_csv('2020-03-20DF.csv')
df1.set_index('Store', inplace=True)
s = df1.stack()
df2 = pd.DataFrame([s.values], columns=[f'{i}-{j}' for i, j in s.index])
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df2)

数据帧堆栈

这会是一个合适的选择吗?

df2 = df1.drop('Total Started', axis=1).stack()
print(df2.head())

Store           
Boston    2 Week    0
          4 Week    5
          5 Week    1
          6 Week    3
New York  2 Week    0
dtype: int64

它使用多索引。

然后,使用元组索引您想要的值。

例如

df2[('Boston', '4 Week')]

5

要获得您实际要求的内容(带有连接字符串的单级索引),您可以执行以下操作:

df2.index = pd.Series(df2.index.values).apply('-'.join)
print(df2.head())

Boston-2 Week      0
Boston-4 Week      5
Boston-5 Week      1
Boston-6 Week      3
New York-2 Week    0
dtype: int64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM