簡體   English   中英

如何在 Python 中使用正則表達式將雜亂的字符串拆分為字母和數字

[英]How to split messy string to letters and numbers using Regex in Python

我有這兩個字符串:

x ='plasma_glucose_concentration183.0000'
y = 'Participants20-30'

並希望按如下方式拆分字符串:

x: ['plasma_glucose_concentration', '183.0000']
y: ['Participants, 20-30']

我創建了這個 function,但只有第一個字符串被正確分割:

def split_string(x):
    res = re.findall(r"(\w+?)(\d*\.\d+|\d+)", x)
    return res

當我拆分第二個字符串時,我得到:

  [('Participants', '20'), ('3', '0')]

是否有任何正則表達式解決方案? 謝謝。

您可以使用

import re

x = ['plasma_glucose_concentration183.0000', 'Participants20-30','2_hour_serum_insulin543.0000']
for s in x:
    print(re.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])', s))

# => ['plasma_glucose_concentration', '183.0000']
#    ['Participants', '20-30']
#    ['2_hour_serum_insulin', '543.0000']

請在線查看正則表達式演示Python 演示

(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])正則表達式在字母和數字之間或之間拆分字符串一個數字和一個字母。

Pandas測試:

>>> import pandas as pd
>>> df = pd.DataFrame({'text':['plasma_glucose_concentration183.0000','Participants20-30','2_hour_serum_insulin543.0000']})
>>> df['text'].str.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])')
0    [plasma_glucose_concentration, 183.0000]
1                       [Participants, 20-30]
2            [2_hour_serum_insulin, 543.0000]
Name: text, dtype: object

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM