I have a Dataframe in a very weird format:
id Code Week1 Week2 week3
sunday nan nan nan nan
id Code Week1 Week2 week3
1 100 y y n
2 200 n y n
3 300 n n y
Monday nan nan nan nan
id Code Week1 Week2 week3
1 500 n y y
2 600 y y y
Tuesday nan nan nan nan
id Code Week1 Week2 week3
1 800 n y y
2 900 y n y
I want to bring it in this format:
Code Day Week
100 Sunday 1
600 Monday 1
900 Tuesday 1
100 Sunday 2
200 Sunday 2
500 Monday 2
600 Monday 2
800 Tuesday 2
300 Sunday 3
500 Monday 3
600 Monday 3
800 Tuesday 3
900 Tuesday 3
ie if in a week the value is y for a Code , that Code will be visited in that week.
Is there any way to do this in pandas?
Not my finest work... but I don't want to try anymore... it hurts my soul.
d = df.query('id != "id"').replace(dict(id={'\d+': None}), regex=True).ffill()
s = d[d.duplicated('id')].set_index(['id', 'Code']).replace({'y': 1, 'n': np.nan}).stack()
s.rename_axis(['Day', 'Code', 'Week']).reset_index('Week').Week.str.replace(
'week', '', flags=re.IGNORECASE
).reset_index()
Day Code Week
0 sunday 100 1
1 sunday 100 2
2 sunday 200 2
3 sunday 300 3
4 Monday 500 2
5 Monday 500 3
6 Monday 600 1
7 Monday 600 2
8 Monday 600 3
9 Tuesday 800 2
10 Tuesday 800 3
11 Tuesday 900 1
12 Tuesday 900 3
You can use:
df.index = df['id'].where(df['Code'].isnull()).ffill()
df = df[(df['Code'] != 'Code') & (df['id'] != df.index)]
df = df.rename_axis('Day').rename_axis('Week', 1)
df = df.set_index(['id','Code'], append=True)
.replace({'n':np.nan})
.stack().reset_index(name='val')
df['Week'] = df['Week'].str.extract('(\d+)', expand=False).astype(int)
cols = ['Code','Day','Week']
df = df.drop(['val','id'], axis=1)[cols].sort_values(['Week','Code']).reset_index(drop=True)
print (df)
Code Day Week
0 100 sunday 1
1 600 Monday 1
2 900 Tuesday 1
3 100 sunday 2
4 200 sunday 2
5 500 Monday 2
6 600 Monday 2
7 800 Tuesday 2
8 300 sunday 3
9 500 Monday 3
10 600 Monday 3
11 800 Tuesday 3
12 900 Tuesday 3
For general output - id
column with all y
and n
values remove replace
:
df.index = df['id'].where(df['Code'].isnull()).ffill()
df = df[(df['Code'] != 'Code') & (df['id'] != df.index)]
df = df.rename_axis('Day').rename_axis('Week', 1)
df = df.set_index(['id','Code'], append=True).stack().reset_index(name='val')
df['Week'] = df['Week'].str.extract('(\d+)', expand=False).astype(int)
print (df)
Day id Code Week val
0 sunday 1 100 1 y
1 sunday 1 100 2 y
2 sunday 1 100 3 n
3 sunday 2 200 1 n
4 sunday 2 200 2 y
5 sunday 2 200 3 n
6 sunday 3 300 1 n
7 sunday 3 300 2 n
8 sunday 3 300 3 y
9 Monday 1 500 1 n
10 Monday 1 500 2 y
11 Monday 1 500 3 y
12 Monday 2 600 1 y
13 Monday 2 600 2 y
14 Monday 2 600 3 y
15 Tuesday 1 800 1 n
16 Tuesday 1 800 2 y
17 Tuesday 1 800 3 y
18 Tuesday 2 900 1 y
19 Tuesday 2 900 2 n
20 Tuesday 2 900 3 y
Based on @piRsquared's answer, for the ones who want pseudo singe-liners
In [2689]: (df.query('id != "id"').replace(dict(id={'\d+': np.nan}), regex=True)
.assign(id=lambda x: x.ffill()).dropna()
.set_index(['id', 'Code'])
.replace({'y': 1, 'n': np.nan})
.rename(columns=lambda x: x.lower().replace('week', ''))
.stack()
.reset_index()
.rename(columns={'id': 'Day', 'level_2': 'Week'})
.drop(0, 1))
Out[2689]:
Day Code Week
0 sunday 100 1
1 sunday 100 2
2 sunday 200 2
3 sunday 300 3
4 Monday 500 2
5 Monday 500 3
6 Monday 600 1
7 Monday 600 2
8 Monday 600 3
9 Tuesday 800 2
10 Tuesday 800 3
11 Tuesday 900 1
12 Tuesday 900 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.