I am working in Python with Pandas and have the following problem. I have a dataframe with a large number of rows depicting cryptocurrency data per date. After reaching the last date, a new time series is started for another cryptocurrency, all in the same columns. I am looking for a way to manipulate the dataframe so that for every token_date, all cryptocurrency data is shown in a single row, so that the total number of rows would equal the total number of token_date 's
Currently the df looks as follows:
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN Wanchain 2019-06-24 0.3817 40414601.0
64 WAN Wanchain 2019-07-01 0.3644 38683920.0
64 WAN Wanchain 2019-07-08 0.3557 37759781.0
64 WAN Wanchain 2019-07-15 0.2625 27824362.0
64 WAN Wanchain 2019-07-22 0.2545 27036722.0
...
57 MAID 2017-07-24 0.3775 170824959.0
57 MAID 2017-07-31 0.2917 132012254.0
57 MAID 2017-08-07 0.3589 162410652.0
57 MAID 2017-08-14 0.3763 170283706.0
57 MAID 2017-08-21 0.4615 208873303.0
...
I am looking for code to achieve something like this.: (The column split will be performed roughly 100 times, ending up with ca. 201 columns)
token_date WAN Wanchain - Price WAN Wanchain - Marketcap ... MAID - Price MAID - Marketcap...
2019-06-24 0.3817 40414601.0 xxx xxx
2019-07-01 0.3644 38683920.0 xxx xxx
2019-07-08 0.3557 37759781.0 xxx xxx
...
I would be thankful for any help. I am a total beginner regarding Python and have no concept on how to achieve this.
Thank you!
If you set the index to ['token_date', 'token_caption']
and unstack the caption to make it a column instead, you get a pretty clean MultiIndex
column with what you're looking for:
In [144]: df
Out[144]:
token_id token_caption token_date token_price_usd token_marketcap_usd
0 64 WAN Wanchain 2019-06-24 0.3817 40414601.0
1 64 WAN Wanchain 2019-07-01 0.3644 38683920.0
2 64 WAN Wanchain 2019-07-08 0.3557 37759781.0
3 64 WAN Wanchain 2019-07-15 0.2625 27824362.0
4 64 WAN Wanchain 2019-07-22 0.2545 27036722.0
5 57 MAID 2019-06-24 0.3775 170824959.0
6 57 MAID 2019-07-01 0.2917 132012254.0
7 57 MAID 2019-07-08 0.3589 162410652.0
8 57 MAID 2019-07-15 0.3763 170283706.0
9 57 MAID 2019-07-22 0.4615 208873303.0
In [145]: df.set_index(["token_date", "token_caption"])[["token_price_usd", "token_marketcap_usd"]].unstack().swaplevel(axis=1)
Out[145]:
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_price_usd token_price_usd token_marketcap_usd token_marketcap_usd
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
Why not use pivot
:
Given data
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN_Wanchain 2019-06-24 0.3817 40414601.0
64 WAN_Wanchain 2019-07-01 0.3644 38683920.0
64 WAN_Wanchain 2019-07-08 0.3557 37759781.0
64 WAN_Wanchain 2019-07-15 0.2625 27824362.0
64 WAN_Wanchain 2019-07-22 0.2545 27036722.0
57 MAID 2019-06-24 0.3775 170824959.0
57 MAID 2019-07-01 0.2917 132012254.0
57 MAID 2019-07-08 0.3589 162410652.0
57 MAID 2019-07-15 0.3763 170283706.0
57 MAID 2019-07-22 0.4615 208873303.0
note I repeated the dates so there was something to match on
df.pivot("token_date", "token_caption", ["token_price_usd", "token_marketcap_usd"])
gives
token_price_usd token_marketcap_usd
token_caption MAID WAN_Wanchain MAID WAN_Wanchain
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
I use pivot_table and construct the new column names:
df=df.pivot_table(index="token_date",columns="token_caption",values=["token_price_usd","token_marketcap_usd"])
token_marketcap_usd token_price_usd
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_date
2017-07-24 170824959.0 NaN 0.3775 NaN
2017-07-31 132012254.0 NaN 0.2917 NaN
2017-08-07 162410652.0 NaN 0.3589 NaN
2017-08-14 170283706.0 NaN 0.3763 NaN
2017-08-21 208873303.0 NaN 0.4615 NaN
2019-06-24 NaN 40414601.0 NaN 0.3817
2019-07-01 NaN 38683920.0 NaN 0.3644
2019-07-08 NaN 37759781.0 NaN 0.3557
2019-07-15 NaN 27824362.0 NaN 0.2625
2019-07-22 NaN 27036722.0 NaN 0.2545
df.columns=[ lev2+" - "+lev1.split("_")[1].title() for lev1,lev2 in df.columns]
df.reindex(sorted(df.columns.values,reverse=True) ,axis=1)
WAN Wanchain - Price WAN Wanchain - Marketcap MAID - Price MAID - Marketcap
token_date
2017-07-24 NaN NaN 0.3775 170824959.0
2017-07-31 NaN NaN 0.2917 132012254.0
2017-08-07 NaN NaN 0.3589 162410652.0
2017-08-14 NaN NaN 0.3763 170283706.0
2017-08-21 NaN NaN 0.4615 208873303.0
2019-06-24 0.3817 40414601.0 NaN NaN
2019-07-01 0.3644 38683920.0 NaN NaN
2019-07-08 0.3557 37759781.0 NaN NaN
2019-07-15 0.2625 27824362.0 NaN NaN
2019-07-22 0.2545 27036722.0 NaN NaN
Finally you can apply 'reset_index'.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.