I have a script that creates a df that looks like this (with many other attributes columns)
So, for each ITEM_IDX , I have a time identifier ( QM_ID ) and a value ( VALUE )
ITEM_IDX;XVAL;YVAL;ZVAL;PT_ID;...;QM_ID;VALUE
1;635000;5020000;15.1;1000000;...;6000;0.00
2;635010;5020000;15.0;1000001;...;6000;0.56
3;635020;5020000;15.2;1000002;...;6000;0.45
1;635000;5020000;15.1;1000000;...;6001;0.10
2;635010;5020000;15.0;1000001;...;6001;0.55
3;635020;5020000;15.2;1000002;...;6001;0.48
1;635000;5020000;15.1;1000000;...;6002;0.13
2;635010;5020000;15.0;1000001;...;6002;0.50
3;635020;5020000;15.2;1000002;...;6002;0.41
I need to creat an output formated like this.
For each ITEM_IDX , I want a column for each QM_ID and the VALUE column as its value.
ITEM_IDX;XVAL;YVAL;ZVAL;PT_ID;...;QM_ID_6000;QM_ID_6001;QM_ID_6002
1;635000;5020000;15.1;1000000;...;0.00;0.10;0.13
2;635010;5020000;15.0;1000001;...;0.56;0.55;0.50
3;635020;5020000;15.2;1000002;...;0.45;0.48;0.41
It's a df of up to 1M lines, with up to 4k different QM_ID so the output will have a lot of columns. (Yeah, I know...)
I tried to create a new df with the main columns then grouping my df by QM_ID and adding the columns one by one but it's slow and not really "pythonic". I'm searching for a way to do it faster and efficiently as I will have to do it quite often.
Thanks a lot:)
PS: I'm using python 3.7.9 and pandas 1.1.3
Edit, my current "solution":
my_df = pd.read_csv(datafile, sep=';')
my_df_result = my_df[['ITEM_IDX','XVAL','YVAL','ZVAL','PT_ID']].drop_duplicates(subset=['ITEM_IDX'], keep='first')
for q in my_df['QM_ID'].unique().tolist():
my_df_result[f'QM_ID_{q}'] = my_df[my_df['QM_ID'] == q]['VALUE'].tolist()
Try this:
df.pivot(['ITEM_IDX', 'XVAL', 'YVAL', 'ZVAL', 'PT_ID'], 'QM_ID', 'VALUE')\
.add_prefix('QM_ID_').reset_index()
Output:
QM_ID ITEM_IDX XVAL YVAL ZVAL PT_ID QM_ID_6000 QM_ID_6001 QM_ID_6002
0 1 635000 5020000 15.1 1000000 0.00 0.10 0.13
1 2 635010 5020000 15.0 1000001 0.56 0.55 0.50
2 3 635020 5020000 15.2 1000002 0.45 0.48 0.41
pivot
your dataframe defining the rows and columns, then use add_prefix
to named the columns correctly, then reset_index
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.