I have a df and it contains all numerical columns. I want to find cumprod
for each columns and append the result of each column's result into side by side. How to do this. I want this side by side result for my convenience to compare.
For Ex:
My Input df:
col1 col2 col3
0 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892
2 0.997779 0.999081 0.998005
3 0.996299 0.998469 0.996676
4 0.994573 0.997754 0.995126
5 0.993095 0.997140 0.993797
6 0.991125 0.996322 0.992027
7 0.989648 0.995708 0.990699
8 0.988171 0.995094 0.989372
9 0.986695 0.994480 0.988045
10 0.984729 0.993660 0.986276
11 0.983010 0.992943 0.984730
Cum prod of df:
col1 col2 col3
0 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892
2 0.996547 0.998572 0.996899
3 0.992859 0.997043 0.993585
4 0.987471 0.994803 0.988742
5 0.980653 0.991958 0.982609
6 0.971949 0.988310 0.974775
7 0.961887 0.984069 0.965708
8 0.950509 0.979241 0.955444
9 0.937863 0.973836 0.944022
10 0.923541 0.967662 0.931066
11 0.907850 0.960833 0.916849
Expected output:
col1 col1 col2 col2 col3 col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996547 0.999081 0.998572 0.998005 0.996899
3 0.996299 0.992859 0.998469 0.997043 0.996676 0.993585
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988742
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982609
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961887 0.995708 0.984069 0.990699 0.965708
8 0.988171 0.950509 0.995094 0.979241 0.989372 0.955444
9 0.986695 0.937863 0.994480 0.973836 0.988045 0.944022
10 0.984729 0.923541 0.993660 0.967662 0.986276 0.931066
11 0.983010 0.907850 0.992943 0.960833 0.984730 0.916849
Note: If I get cum_of_coln
instead of coln
in column name is more prefered
Code to get cum_prod I used,
print df
print df.cumprod()
Compute the cumprod
, then use cytoolz
and interleave the column headers:
from toolz import interleave
df2 = df.cumprod().add_prefix('cum_of_')
df3 = pd.concat([df, df2], axis=1)[list(interleave([df, df2]))]
Or, you can use sorted
:
df2 = df.cumprod().add_prefix('cum_of_')
df3 = pd.concat([df, df2], axis=1)
df3 = df3[sorted(df3, key=lambda x: x.split('_')[-1])]
A third option would be mutating the column headers after sorting. Should be pretty efficient.
df3 = pd.concat([df, df.cumprod()], axis=1).sort_index(axis=1)
c = df3.columns.values
c[1::2] = 'cum_of_' + c[1::2]
df3.columns = c
df3.head()
col1 cum_of_col1 col2 cum_of_col2 col3 cum_of_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
Use concat
and reorder by list generated by list comprehension:
cols = [item for x in df.columns for item in (x, 'cum_of_' + x)]
df = pd.concat([df, df.cumprod().add_prefix('cum_of_')], axis=1)[cols]
print (df)
col1 cum_of_col1 col2 cum_of_col2 col3 cum_of_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982610
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961888 0.995708 0.984068 0.990699 0.965709
8 0.988171 0.950510 0.995094 0.979240 0.989372 0.955445
9 0.986695 0.937863 0.994480 0.973835 0.988045 0.944023
10 0.984729 0.923541 0.993660 0.967661 0.986276 0.931067
11 0.983010 0.907850 0.992943 0.960832 0.984730 0.916850
Append the columns directly with pd.assign
:
df.assign(**df.cumprod().add_prefix('cumprod_'))
col1 col2 col3 cumprod_col1 cumprod_col2 cumprod_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892 0.998766 0.999490 0.998892
2 0.997779 0.999081 0.998005 0.996548 0.998571 0.996899
3 0.996299 0.998469 0.996676 0.992860 0.997043 0.993586
4 0.994573 0.997754 0.995126 0.987471 0.994803 0.988743
5 0.993095 0.997140 0.993797 0.980653 0.991958 0.982610
6 0.991125 0.996322 0.992027 0.971949 0.988310 0.974775
7 0.989648 0.995708 0.990699 0.961888 0.984068 0.965709
8 0.988171 0.995094 0.989372 0.950510 0.979240 0.955445
9 0.986695 0.994480 0.988045 0.937863 0.973835 0.944023
10 0.984729 0.993660 0.986276 0.923541 0.967661 0.931067
11 0.983010 0.992943 0.984730 0.907850 0.960832 0.916850
If you want the columns ordered as col1 - cumprod_col1...
you can use reindex_axis
to sort the columns alphabetically, having added a suffix in this case with add_suffix
df = df.assign(**df.cumprod().add_suffix('_cumprod'))
df = df.reindex_axis(sorted(df.columns), axis=1)
col1 col1_cumprod col2 col2_cumprod col3 col3_cumprod
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982610
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961888 0.995708 0.984068 0.990699 0.965709
8 0.988171 0.950510 0.995094 0.979240 0.989372 0.955445
9 0.986695 0.937863 0.994480 0.973835 0.988045 0.944023
10 0.984729 0.923541 0.993660 0.967661 0.986276 0.931067
11 0.983010 0.907850 0.992943 0.960832 0.984730 0.916850
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.