I have a DataFrame
like this:
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
m1 m2 m3 m4
bar one 2016-01 -0.0 1.0 3.0 2.0
two 2016-02 1.0 1.0 1.0 2.0
baz one 2016-01 -1.0 -1.0 2.0 1.0
two 2016-02 1.0 2.0 1.0 2.0
foo one 2016-01 1.0 -0.0 -0.0 -0.0
two 2016-02 -2.0 -0.0 -0.0 -0.0
qux one 2016-01 -0.0 -0.0 -1.0 1.0
two 2016-02 -0.0 -0.0 1.0 -0.0
Let's say I want to replace all 2016 for 2017 in the column name for m2 and m4 so that the 2016 rows will have values for m1 and m3 but not for m2 and m4. And so the 2017 rows will have values for m2 and m4 but not m1 and m3. Something similar to this DataFrame
:
m1 m2 m3 m4
bar one 2016-01 -0.0 0.0 3.0 0.0
two 2016-02 1.0 0.0 1.0 0.0
one 2017-01 0.0 1.0 0.0 2.0
two 2017-02 0.0 1.0 0.0 2.0
baz one 2016-01 -1.0 0.0 2.0 0.0
two 2016-02 1.0 0.0 1.0 0.0
one 2017-01 0.0 -1.0 0.0 1.0
two 2017-02 0.0 2.0 0.0 2.0
I've tried to unstack()
the dataframe and rename each column but that doesn't seem to work and I'm not sure why.
df = df.unstack()
df.unstack()['m2'] = df.unstack()['m2'].rename(columns = lambda t: t.replace('2016','2017'))
import numpy as np
import pandas as pd
np.random.seed(2017)
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
df2 = df[['m2', 'm4']]
df2.index = pd.MultiIndex.from_arrays(
[df.index.get_level_values(i) for i in [0,1]]
+ [df.index.get_level_values(-1).str.replace('2016','2017')])
result = pd.concat([df[['m1','m3']], df2], axis=0).fillna(0)
result = result.sort_index(level=[0,2,1])
print(result)
converts
m1 m2 m3 m4
bar one 2016-01 -1.0 -0.0 1.0 1.0
two 2016-02 -0.0 -0.0 -0.0 -0.0
baz one 2016-01 1.0 -0.0 -1.0 -0.0
two 2016-02 -1.0 1.0 1.0 -0.0
foo one 2016-01 -0.0 -0.0 -1.0 -1.0
two 2016-02 2.0 -0.0 -0.0 -0.0
qux one 2016-01 1.0 2.0 -0.0 2.0
two 2016-02 1.0 1.0 -0.0 -0.0
into
m1 m2 m3 m4
bar one 2016-01 -1.0 0.0 1.0 0.0
two 2016-02 -0.0 0.0 -0.0 0.0
one 2017-01 0.0 -0.0 0.0 1.0
two 2017-02 0.0 -0.0 0.0 -0.0
baz one 2016-01 1.0 0.0 -1.0 0.0
two 2016-02 -1.0 0.0 1.0 0.0
one 2017-01 0.0 -0.0 0.0 -0.0
two 2017-02 0.0 1.0 0.0 -0.0
foo one 2016-01 -0.0 0.0 -1.0 0.0
two 2016-02 2.0 0.0 -0.0 0.0
one 2017-01 0.0 -0.0 0.0 -1.0
two 2017-02 0.0 -0.0 0.0 -0.0
qux one 2016-01 1.0 0.0 -0.0 0.0
two 2016-02 1.0 0.0 -0.0 0.0
one 2017-01 0.0 2.0 0.0 2.0
two 2017-02 0.0 1.0 0.0 -0.0
I am not sure I quite understand your question, here is what i did and the output.
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array(['2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02', '2016-01', '2016-02'])]
df = pd.DataFrame(np.ceil(np.random.randn(8, 4)), index=arrays)
df.rename(columns={0:'m1',1:'m2',2:'m3',3:'m4'},inplace=True)
df = df.reset_index()
df['level_2'] = df['level_2'].str.replace("2016","2017")
Which gives me the output:
level_0 level_1 level_2 m1 m2 m3 m4
0 bar one 2017-01 -0.0 -1.0 -0.0 -0.0
1 bar two 2017-02 -0.0 -1.0 2.0 2.0
2 baz one 2017-01 -2.0 1.0 -0.0 1.0
3 baz two 2017-02 -0.0 1.0 -1.0 2.0
4 foo one 2017-01 1.0 -0.0 -1.0 -0.0
5 foo two 2017-02 -1.0 -2.0 1.0 -0.0
6 qux one 2017-01 1.0 1.0 -0.0 1.0
7 qux two 2017-02 1.0 -1.0 2.0 -1.0
If you could let me know what you are expecting based on this, I will modify my answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.