简体   繁体   中英

Convert (partly) Vertical DataFrame to Horizontal with Pandas

I have a script that creates a df that looks like this (with many other attributes columns)

So, for each ITEM_IDX , I have a time identifier ( QM_ID ) and a value ( VALUE )

ITEM_IDX;XVAL;YVAL;ZVAL;PT_ID;...;QM_ID;VALUE
1;635000;5020000;15.1;1000000;...;6000;0.00
2;635010;5020000;15.0;1000001;...;6000;0.56
3;635020;5020000;15.2;1000002;...;6000;0.45
1;635000;5020000;15.1;1000000;...;6001;0.10
2;635010;5020000;15.0;1000001;...;6001;0.55
3;635020;5020000;15.2;1000002;...;6001;0.48
1;635000;5020000;15.1;1000000;...;6002;0.13
2;635010;5020000;15.0;1000001;...;6002;0.50
3;635020;5020000;15.2;1000002;...;6002;0.41

I need to creat an output formated like this.
For each ITEM_IDX , I want a column for each QM_ID and the VALUE column as its value.

ITEM_IDX;XVAL;YVAL;ZVAL;PT_ID;...;QM_ID_6000;QM_ID_6001;QM_ID_6002
1;635000;5020000;15.1;1000000;...;0.00;0.10;0.13
2;635010;5020000;15.0;1000001;...;0.56;0.55;0.50
3;635020;5020000;15.2;1000002;...;0.45;0.48;0.41

It's a df of up to 1M lines, with up to 4k different QM_ID so the output will have a lot of columns. (Yeah, I know...)

I tried to create a new df with the main columns then grouping my df by QM_ID and adding the columns one by one but it's slow and not really "pythonic". I'm searching for a way to do it faster and efficiently as I will have to do it quite often.

Thanks a lot:)

PS: I'm using python 3.7.9 and pandas 1.1.3

Edit, my current "solution":

my_df = pd.read_csv(datafile, sep=';')
my_df_result = my_df[['ITEM_IDX','XVAL','YVAL','ZVAL','PT_ID']].drop_duplicates(subset=['ITEM_IDX'], keep='first')
for q in my_df['QM_ID'].unique().tolist():
    my_df_result[f'QM_ID_{q}'] = my_df[my_df['QM_ID'] == q]['VALUE'].tolist()

Try this:

df.pivot(['ITEM_IDX', 'XVAL', 'YVAL', 'ZVAL', 'PT_ID'], 'QM_ID', 'VALUE')\
  .add_prefix('QM_ID_').reset_index()

Output:

QM_ID  ITEM_IDX    XVAL     YVAL  ZVAL    PT_ID  QM_ID_6000  QM_ID_6001  QM_ID_6002
0             1  635000  5020000  15.1  1000000        0.00        0.10        0.13
1             2  635010  5020000  15.0  1000001        0.56        0.55        0.50
2             3  635020  5020000  15.2  1000002        0.45        0.48        0.41

pivot your dataframe defining the rows and columns, then use add_prefix to named the columns correctly, then reset_index

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM