I m using Pandas with Python 3.6. My script loads an excel file which contains multiple worksheets. In some sheets, rows have either numeric values, or string values on two columns. After running the script, the numeric values are splitted on two columns, but I cannot duplicate on the second column the string value of the first column.
For the numeric values, I am using :
df=df[['ID_Test']].join(df[pd_column].str.split(':',expand=True)).rename(columns={0: pd_column, 1: ''})
The second column for the string values remains blank (None), and must be updated with the same value than the first column
If I use : df[''] = df[pd_column]
, the second column [''] is entirely updated with the values of the first one (overwriting numeric values), and I did not find any solution specific to my concern.
Data Input:
ID_Test_1 Test_1
Indicator_1 AAAAAAA
Indicator_2 2.745 : 2.03
Indicator_3 BBBBBBBB
Indicator_4 -5.013 : -5.013
Indicator_5 CCCCCCCC
Actual Output : (Wrong)
ID_Test_1 Test_1
Indicator_1 AAAAAAA None
Indicator_2 2.745 2.03
Indicator_3 BBBBBBBB None
Indicator_4 -5.013 -5.013
Indicator_5 CCCCCCCCC None
Desired Output :
ID_Test_1 Test_1
Indicator_1 AAAAAAA AAAAAAA
Indicator_2 2.745 2.03
Indicator_3 BBBBBBBB BBBBBBBB
Indicator_4 -5.013 -5.013
Indicator_5 CCCCCCCCC CCCCCCCCC
The second column must not have a label
Base on your sample data and code. I just added ffill(1)
to the split before join
pd_column = 'Test_1'
(df[['ID_Test_1']].join(df[pd_column].str.split('\s+:\s+',expand=True).ffill(1))
.rename(columns={0: pd_column, 1: ''}))
Out[29]:
ID_Test_1 Test_1
0 Indicator_1 AAAAAAA AAAAAAA
1 Indicator_2 2.745 2.03
2 Indicator_3 BBBBBBBB BBBBBBBB
3 Indicator_4 -5.013 -5.013
4 Indicator_5 CCCCCCCC CCCCCCCC
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.