[英]Reformat Dataframe in pandas
我有一個非常奇怪的格式的Dataframe:
id Code Week1 Week2 week3
sunday nan nan nan nan
id Code Week1 Week2 week3
1 100 y y n
2 200 n y n
3 300 n n y
Monday nan nan nan nan
id Code Week1 Week2 week3
1 500 n y y
2 600 y y y
Tuesday nan nan nan nan
id Code Week1 Week2 week3
1 800 n y y
2 900 y n y
我想以這種格式帶來它:
Code Day Week
100 Sunday 1
600 Monday 1
900 Tuesday 1
100 Sunday 2
200 Sunday 2
500 Monday 2
600 Monday 2
800 Tuesday 2
300 Sunday 3
500 Monday 3
600 Monday 3
800 Tuesday 3
900 Tuesday 3
即如果一周內代碼的值為y,則該代碼將在該周訪問。
在熊貓中有沒有辦法做到這一點?
不是我最好的工作......但我不想再嘗試了......它傷害了我的靈魂。
d = df.query('id != "id"').replace(dict(id={'\d+': None}), regex=True).ffill()
s = d[d.duplicated('id')].set_index(['id', 'Code']).replace({'y': 1, 'n': np.nan}).stack()
s.rename_axis(['Day', 'Code', 'Week']).reset_index('Week').Week.str.replace(
'week', '', flags=re.IGNORECASE
).reset_index()
Day Code Week
0 sunday 100 1
1 sunday 100 2
2 sunday 200 2
3 sunday 300 3
4 Monday 500 2
5 Monday 500 3
6 Monday 600 1
7 Monday 600 2
8 Monday 600 3
9 Tuesday 800 2
10 Tuesday 800 3
11 Tuesday 900 1
12 Tuesday 900 3
您可以使用:
df.index = df['id'].where(df['Code'].isnull()).ffill()
df = df[(df['Code'] != 'Code') & (df['id'] != df.index)]
df = df.rename_axis('Day').rename_axis('Week', 1)
df = df.set_index(['id','Code'], append=True)
.replace({'n':np.nan})
.stack().reset_index(name='val')
df['Week'] = df['Week'].str.extract('(\d+)', expand=False).astype(int)
cols = ['Code','Day','Week']
df = df.drop(['val','id'], axis=1)[cols].sort_values(['Week','Code']).reset_index(drop=True)
print (df)
Code Day Week
0 100 sunday 1
1 600 Monday 1
2 900 Tuesday 1
3 100 sunday 2
4 200 sunday 2
5 500 Monday 2
6 600 Monday 2
7 800 Tuesday 2
8 300 sunday 3
9 500 Monday 3
10 600 Monday 3
11 800 Tuesday 3
12 900 Tuesday 3
對於一般輸出 - 具有所有y
和n
值的id
列刪除replace
:
df.index = df['id'].where(df['Code'].isnull()).ffill()
df = df[(df['Code'] != 'Code') & (df['id'] != df.index)]
df = df.rename_axis('Day').rename_axis('Week', 1)
df = df.set_index(['id','Code'], append=True).stack().reset_index(name='val')
df['Week'] = df['Week'].str.extract('(\d+)', expand=False).astype(int)
print (df)
Day id Code Week val
0 sunday 1 100 1 y
1 sunday 1 100 2 y
2 sunday 1 100 3 n
3 sunday 2 200 1 n
4 sunday 2 200 2 y
5 sunday 2 200 3 n
6 sunday 3 300 1 n
7 sunday 3 300 2 n
8 sunday 3 300 3 y
9 Monday 1 500 1 n
10 Monday 1 500 2 y
11 Monday 1 500 3 y
12 Monday 2 600 1 y
13 Monday 2 600 2 y
14 Monday 2 600 3 y
15 Tuesday 1 800 1 n
16 Tuesday 1 800 2 y
17 Tuesday 1 800 3 y
18 Tuesday 2 900 1 y
19 Tuesday 2 900 2 n
20 Tuesday 2 900 3 y
基於@ piRsquared的答案,對於那些想要偽單線的人來說
In [2689]: (df.query('id != "id"').replace(dict(id={'\d+': np.nan}), regex=True)
.assign(id=lambda x: x.ffill()).dropna()
.set_index(['id', 'Code'])
.replace({'y': 1, 'n': np.nan})
.rename(columns=lambda x: x.lower().replace('week', ''))
.stack()
.reset_index()
.rename(columns={'id': 'Day', 'level_2': 'Week'})
.drop(0, 1))
Out[2689]:
Day Code Week
0 sunday 100 1
1 sunday 100 2
2 sunday 200 2
3 sunday 300 3
4 Monday 500 2
5 Monday 500 3
6 Monday 600 1
7 Monday 600 2
8 Monday 600 3
9 Tuesday 800 2
10 Tuesday 800 3
11 Tuesday 900 1
12 Tuesday 900 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.