简体   繁体   English

在 pandas / python 中拆分数字字符串

[英]Splitting a number string in pandas / python

I am looking to split the following column to two columns in my pandas dataframe by splitting on the last 0 in every row.我希望通过在每行的最后一个 0 上拆分,将以下行拆分为我的 pandas dataframe 中的两列。

000012345
000012345
000012345

What I would like it to look like我希望它看起来像什么

0000 12345
0000 12345

I've been looking into str.split from can't seem to figure how to approach this as there is no usual delimiter, and I can not figure out how to make it split on the 4th 0.我一直在研究 str.split,似乎无法弄清楚如何解决这个问题,因为没有通常的分隔符,而且我无法弄清楚如何在第 4 个 0 上拆分它。

I have had success with a similar issue previously with the following command, but can not seem to figure it out, as im not looking to split labels, but values in the rows.我之前使用以下命令成功解决了类似问题,但似乎无法弄清楚,因为我不希望拆分标签,而是要拆分行中的值。

df.labels.str.split(':',1).tolist() df.labels.str.split(':',1).tolist()

Assuming "col" the column, you can split with a lookbehind regex:假设“col”列,您可以使用后向正则表达式进行split

df['col'].str.split(r'(?<=^.{4})', expand=True)

regex:正则表达式:

(?<=^.{4})    # match the empty space preceded by the first 4 characters

Or use str.extract :或使用str.extract

df['col'].str.extract('(^.{4})(.*)')
# df[['col2', 'col3']] = df['col'].str.extract('(^.{4})(.*)')

Full example:完整示例:

df[['col2', 'col3']] = df['col'].str.split('(?<=^.{4})', expand=True)

output: output:

         col  col2   col3
0  000012345  0000  12345
1  000012345  0000  12345
2  000012345  0000  12345

Check below code using string replace使用字符串替换检查以下代码

df = pd.DataFrame({'col1':['000012345','000012345','000012345']})

df['col2'] =  df['col1'].astype(int)

df['col3'] = df.apply(lambda row: row['col1'].replace(str(row['col2']),''), axis =1)

print(df)

Output: Output:

        col1   col2  col3
0  000012345  12345  0000
1  000012345  12345  0000
2  000012345  12345  0000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM