简体   繁体   中英

Create new columns containing the rows of an existing column based on values in those rows

I am working in Python with Pandas and have the following problem. I have a dataframe with a large number of rows depicting cryptocurrency data per date. After reaching the last date, a new time series is started for another cryptocurrency, all in the same columns. I am looking for a way to manipulate the dataframe so that for every token_date, all cryptocurrency data is shown in a single row, so that the total number of rows would equal the total number of token_date 's

Currently the df looks as follows:

token_id    token_caption   token_date  token_price_usd token_marketcap_usd
64          WAN Wanchain    2019-06-24  0.3817          40414601.0
64          WAN Wanchain    2019-07-01  0.3644          38683920.0
64          WAN Wanchain    2019-07-08  0.3557          37759781.0
64          WAN Wanchain    2019-07-15  0.2625          27824362.0
64          WAN Wanchain    2019-07-22  0.2545          27036722.0
...
57          MAID            2017-07-24  0.3775          170824959.0
57          MAID            2017-07-31  0.2917          132012254.0
57          MAID            2017-08-07  0.3589          162410652.0
57          MAID            2017-08-14  0.3763          170283706.0
57          MAID            2017-08-21  0.4615          208873303.0
...

I am looking for code to achieve something like this.: (The column split will be performed roughly 100 times, ending up with ca. 201 columns)

token_date   WAN Wanchain - Price   WAN Wanchain - Marketcap  ...  MAID - Price   MAID - Marketcap...
2019-06-24   0.3817                 40414601.0                     xxx            xxx
2019-07-01   0.3644                 38683920.0                     xxx            xxx
2019-07-08   0.3557                 37759781.0                     xxx            xxx
...

I would be thankful for any help. I am a total beginner regarding Python and have no concept on how to achieve this.

Thank you!

If you set the index to ['token_date', 'token_caption'] and unstack the caption to make it a column instead, you get a pretty clean MultiIndex column with what you're looking for:

In [144]: df
Out[144]:
   token_id token_caption  token_date  token_price_usd  token_marketcap_usd
0        64  WAN Wanchain  2019-06-24           0.3817           40414601.0
1        64  WAN Wanchain  2019-07-01           0.3644           38683920.0
2        64  WAN Wanchain  2019-07-08           0.3557           37759781.0
3        64  WAN Wanchain  2019-07-15           0.2625           27824362.0
4        64  WAN Wanchain  2019-07-22           0.2545           27036722.0
5        57          MAID  2019-06-24           0.3775          170824959.0
6        57          MAID  2019-07-01           0.2917          132012254.0
7        57          MAID  2019-07-08           0.3589          162410652.0
8        57          MAID  2019-07-15           0.3763          170283706.0
9        57          MAID  2019-07-22           0.4615          208873303.0

In [145]: df.set_index(["token_date", "token_caption"])[["token_price_usd", "token_marketcap_usd"]].unstack().swaplevel(axis=1)
Out[145]:
token_caption            MAID    WAN Wanchain                MAID        WAN Wanchain
              token_price_usd token_price_usd token_marketcap_usd token_marketcap_usd
token_date
2019-06-24             0.3775          0.3817         170824959.0          40414601.0
2019-07-01             0.2917          0.3644         132012254.0          38683920.0
2019-07-08             0.3589          0.3557         162410652.0          37759781.0
2019-07-15             0.3763          0.2625         170283706.0          27824362.0
2019-07-22             0.4615          0.2545         208873303.0          27036722.0

Why not use pivot :

Given data

token_id    token_caption   token_date  token_price_usd token_marketcap_usd
64          WAN_Wanchain    2019-06-24  0.3817          40414601.0
64          WAN_Wanchain    2019-07-01  0.3644          38683920.0
64          WAN_Wanchain    2019-07-08  0.3557          37759781.0
64          WAN_Wanchain    2019-07-15  0.2625          27824362.0
64          WAN_Wanchain    2019-07-22  0.2545          27036722.0
57          MAID            2019-06-24  0.3775          170824959.0
57          MAID            2019-07-01  0.2917          132012254.0
57          MAID            2019-07-08  0.3589          162410652.0
57          MAID            2019-07-15  0.3763          170283706.0
57          MAID            2019-07-22  0.4615          208873303.0

note I repeated the dates so there was something to match on

df.pivot("token_date", "token_caption", ["token_price_usd", "token_marketcap_usd"])

gives

              token_price_usd              token_marketcap_usd             
token_caption            MAID WAN_Wanchain                MAID WAN_Wanchain
token_date                                                                 
2019-06-24             0.3775       0.3817         170824959.0   40414601.0
2019-07-01             0.2917       0.3644         132012254.0   38683920.0
2019-07-08             0.3589       0.3557         162410652.0   37759781.0
2019-07-15             0.3763       0.2625         170283706.0   27824362.0
2019-07-22             0.4615       0.2545         208873303.0   27036722.0

I use pivot_table and construct the new column names:

df=df.pivot_table(index="token_date",columns="token_caption",values=["token_price_usd","token_marketcap_usd"])

token_marketcap_usd              token_price_usd             
token_caption                MAID WAN Wanchain            MAID WAN Wanchain
token_date                                                                 
2017-07-24            170824959.0          NaN          0.3775          NaN
2017-07-31            132012254.0          NaN          0.2917          NaN
2017-08-07            162410652.0          NaN          0.3589          NaN
2017-08-14            170283706.0          NaN          0.3763          NaN
2017-08-21            208873303.0          NaN          0.4615          NaN
2019-06-24                    NaN   40414601.0             NaN       0.3817
2019-07-01                    NaN   38683920.0             NaN       0.3644
2019-07-08                    NaN   37759781.0             NaN       0.3557
2019-07-15                    NaN   27824362.0             NaN       0.2625
2019-07-22                    NaN   27036722.0             NaN       0.2545

df.columns=[ lev2+" - "+lev1.split("_")[1].title() for lev1,lev2 in df.columns]
df.reindex(sorted(df.columns.values,reverse=True) ,axis=1)

            WAN Wanchain - Price  WAN Wanchain - Marketcap  MAID - Price  MAID - Marketcap
token_date                                                                                
2017-07-24                   NaN                       NaN        0.3775       170824959.0
2017-07-31                   NaN                       NaN        0.2917       132012254.0
2017-08-07                   NaN                       NaN        0.3589       162410652.0
2017-08-14                   NaN                       NaN        0.3763       170283706.0
2017-08-21                   NaN                       NaN        0.4615       208873303.0
2019-06-24                0.3817                40414601.0           NaN               NaN
2019-07-01                0.3644                38683920.0           NaN               NaN
2019-07-08                0.3557                37759781.0           NaN               NaN
2019-07-15                0.2625                27824362.0           NaN               NaN
2019-07-22                0.2545                27036722.0           NaN               NaN

Finally you can apply 'reset_index'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM