简体   繁体   中英

Python Pandas, transform dataframe

I have the following dataframe (sim_2005):

Date         ELEM1 ELEM2 ... ELEM1133
2005-01-01   0.021 2.455 ... 345.2
2005-01-02   0.321 2.331 ... 355.1
...          ...   ...   ... ...
2005-12-31   0.789 3.456 ... 459.9
[365 rows x 1133 columns]

I need to import this into a (postgresql) Database with the following format:

ID     Date        ELEM     Value
1      2005-01-01  ELEM1    0.021
2      2005-01-02  ELEM1    0.321
...
366    2005-01-01  ELEM2    2.455
367    2005-01-02  ELEM2    2.331
...
402983 2005-01-01  ELEM1133 345.2
402984 2005-01-02  ELEM1133 355.1
...
403348 2005-12-31  ELEM1133 459.9

I'm trying my best with iterating over the columns and concatenating it step by step,

for column in sim_2005:
    sim_concat = pd.concat([pd.DataFrame(sim_2005.columns.values), sim_2005.ix[:, column]], ignore_index=True)

which produces by far not what I'm looking for.

I also tried pandas pivot function, groupby, etc... but I guess my understanding of pandas and python in general is not sufficient for this transformation. Any help?

Edit:

Thanks, pd.melt was the answer!

 sim_2005_melted = pd.melt(sim_2005, id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim').sort(columns='Date')

Results in:

ID     Date   ELEM     Q_sim
1      NaN    ELEM1    0.021
2      NaN    ELEM1    0.321
...
366    NaN    ELEM2    2.455
367    NaN    ELEM2    2.331
...
402983 NaN    ELEM1133 345.2
402984 NaN    ELEM1133 355.1

For some reason the datetime index is not transported over, but the overall structure is exactly how I wanted it!

It looks like you're doing a "wide-to-narrow" transformation, in pandas terminology. One way to do this is with melt .

In this case, you could do something like:

pd.melt(df, id_vars=['Date'], value_vars=['ELEM1', 'ELEM2']).sort(columns='Date')

for a dataframe with two "wide" columns ['ELEM1', 'ELEM2'] . For your ~1K columns in the example, you'd obviously not write them by hand, and rather create value_vars using some sort of list comprehension.


Example

df = pd.DataFrame({
    'Date': range(4),
    'ELEM1': range(1, 5),
    'ELEM2': range(11, 15),
    'ELEM2': range(-5, -1)
})

>> df
    Date    ELEM1   ELEM2
0   0   1   -5
1   1   2   -4
2   2   3   -3
3   3   4   -2

>> pd.melt(df, id_vars=['Date'], value_vars=['ELEM1', 'ELEM2']).sort(columns='Date')
    Date    variable    value
0   0   ELEM1   1
4   0   ELEM2   -5
1   1   ELEM1   2
5   1   ELEM2   -4
2   2   ELEM1   3
6   2   ELEM2   -3
3   3   ELEM1   4
7   3   ELEM2   -2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM