如何在 Python 中使用正則表達式將雜亂的字符串拆分為字母和數字

Question

我有這兩個字符串：

x ='plasma_glucose_concentration183.0000'
y = 'Participants20-30'

並希望按如下方式拆分字符串：

x: ['plasma_glucose_concentration', '183.0000']
y: ['Participants, 20-30']

我創建了這個 function，但只有第一個字符串被正確分割：

def split_string(x):
    res = re.findall(r"(\w+?)(\d*\.\d+|\d+)", x)
    return res

當我拆分第二個字符串時，我得到：

  [('Participants', '20'), ('3', '0')]

是否有任何正則表達式解決方案？ 謝謝。

Answer 1

您可以使用

import re

x = ['plasma_glucose_concentration183.0000', 'Participants20-30','2_hour_serum_insulin543.0000']
for s in x:
    print(re.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])', s))

# => ['plasma_glucose_concentration', '183.0000']
#    ['Participants', '20-30']
#    ['2_hour_serum_insulin', '543.0000']

請在線查看正則表達式演示和Python 演示。

(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])正則表達式在字母和數字之間或之間拆分字符串一個數字和一個字母。

Pandas測試：

>>> import pandas as pd
>>> df = pd.DataFrame({'text':['plasma_glucose_concentration183.0000','Participants20-30','2_hour_serum_insulin543.0000']})
>>> df['text'].str.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])')
0    [plasma_glucose_concentration, 183.0000]
1                       [Participants, 20-30]
2            [2_hour_serum_insulin, 543.0000]
Name: text, dtype: object

如何在 Python 中使用正則表達式將雜亂的字符串拆分為字母和數字

問題描述

1 個解決方案

解決方案1
1 已采納 2021-01-28 14:02:57

如何在 Python 中使用正則表達式將雜亂的字符串拆分為字母和數字

問題描述

1 個解決方案

解決方案1 1 已采納 2021-01-28 14:02:57

解決方案1
1 已采納 2021-01-28 14:02:57