简体   繁体   中英

Splitting a number string in pandas / python

I am looking to split the following column to two columns in my pandas dataframe by splitting on the last 0 in every row.

000012345
000012345
000012345

What I would like it to look like

0000 12345
0000 12345

I've been looking into str.split from can't seem to figure how to approach this as there is no usual delimiter, and I can not figure out how to make it split on the 4th 0.

I have had success with a similar issue previously with the following command, but can not seem to figure it out, as im not looking to split labels, but values in the rows.

df.labels.str.split(':',1).tolist()

Assuming "col" the column, you can split with a lookbehind regex:

df['col'].str.split(r'(?<=^.{4})', expand=True)

regex:

(?<=^.{4})    # match the empty space preceded by the first 4 characters

Or use str.extract :

df['col'].str.extract('(^.{4})(.*)')
# df[['col2', 'col3']] = df['col'].str.extract('(^.{4})(.*)')

Full example:

df[['col2', 'col3']] = df['col'].str.split('(?<=^.{4})', expand=True)

output:

         col  col2   col3
0  000012345  0000  12345
1  000012345  0000  12345
2  000012345  0000  12345

Check below code using string replace

df = pd.DataFrame({'col1':['000012345','000012345','000012345']})

df['col2'] =  df['col1'].astype(int)

df['col3'] = df.apply(lambda row: row['col1'].replace(str(row['col2']),''), axis =1)

print(df)

Output:

        col1   col2  col3
0  000012345  12345  0000
1  000012345  12345  0000
2  000012345  12345  0000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM