解析所有列名並創建新列

Question

date     red,heavy,new  blue,light,old
1-2-20   320             120
2-3-20   220             125

我想遍歷所有行和列，以便我可以解析列名並將它們用作新列的值。 我想得到這種格式的數據：

我想要重復日期。 'value' col 來自原始表。

date     color           weight   condition.  value
1-2-20   red             heavy     new        320
1-2-20   blue            light.    old.       120
2-3-20   red.            heavy     new.       220

我試過這個，當我只有一列時它有效

colName = df_retransform.columns[1]

lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]


df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition

但我無法修改它，以便我可以對所有列進行修改。

Answer 1

使用DataFrame.melt和Series.str.split ， DataFrame.pop用於使用和刪除列variable ，如有必要，最后更改列名的順序：

首先，您可以測試所有沒有數據的列是否有 2 , :

print ([col for col in df.columns if col.count(',') != 2])
['date'] 


df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)

df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
     date color weight condition  value
0  1-2-20   red  heavy       new    320
1  2-3-20   red  heavy       new    220
2  1-2-20  blue  light       old    120
3  2-3-20  blue  light       old    125

或者對MultiIndex Series使用DataFrame.stack ，然后為新列拆分並重新創建新的所有級別：

print (df)
    date  red,heavy,new  blue,light,old
0  1-2-20            320             NaN
1     NaN            220           125.0

s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index], 
                                    names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)

     date color weight condition  value
0  1-2-20   red  heavy       new  320.0
1  1-2-20  blue  light       old    NaN
2     NaN   red  heavy       new  220.0
3     NaN  blue  light       old  125.0

Answer 2

您還可以使用pyjanitor的 pivot_longer function ； 目前您必須從github安裝最新的開發版本：

 # install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import janitor

df.pivot_longer(index="date", 
                names_to=("color", "weight", "condition"), 
                names_sep=",")

date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   2-3-20  red     heavy   new     220
2   1-2-20  blue    light   old     120
3   2-3-20  blue    light   old     125

您將新列的名稱傳遞給names_to ，並在names_sep中指定分隔符 ( , )。

如果您希望它按出現順序返回，您可以將 boolean True傳遞給sort_by_appearance參數：

df.pivot_longer(
    index="date",
    names_to=("color", "weight", "condition"),
    names_sep=",",
    sort_by_appearance=True,
)


    date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   1-2-20  blue    light   old     120
2   2-3-20  red     heavy   new     220
3   2-3-20  blue    light   old     125

解析所有列名並創建新列

問題描述

2 個解決方案

解決方案1
5 2021-02-09 10:16:36

解決方案2
1 已采納 2021-02-09 10:31:17

解析所有列名並創建新列

問題描述

2 個解決方案

解決方案1 5 2021-02-09 10:16:36

解決方案2 1 已采納 2021-02-09 10:31:17

解決方案1
5 2021-02-09 10:16:36

解決方案2
1 已采納 2021-02-09 10:31:17