[英]How to split messy string to letters and numbers using Regex in Python
我有這兩個字符串:
x ='plasma_glucose_concentration183.0000'
y = 'Participants20-30'
並希望按如下方式拆分字符串:
x: ['plasma_glucose_concentration', '183.0000']
y: ['Participants, 20-30']
我創建了這個 function,但只有第一個字符串被正確分割:
def split_string(x):
res = re.findall(r"(\w+?)(\d*\.\d+|\d+)", x)
return res
當我拆分第二個字符串時,我得到:
[('Participants', '20'), ('3', '0')]
是否有任何正則表達式解決方案? 謝謝。
您可以使用
import re
x = ['plasma_glucose_concentration183.0000', 'Participants20-30','2_hour_serum_insulin543.0000']
for s in x:
print(re.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])', s))
# => ['plasma_glucose_concentration', '183.0000']
# ['Participants', '20-30']
# ['2_hour_serum_insulin', '543.0000']
(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])
正則表達式在字母和數字之間或之間拆分字符串一個數字和一個字母。
Pandas測試:
>>> import pandas as pd
>>> df = pd.DataFrame({'text':['plasma_glucose_concentration183.0000','Participants20-30','2_hour_serum_insulin543.0000']})
>>> df['text'].str.split(r'(?<=[^\W\d_])(?=\d)|(?<=\d)(?=[^\W\d_])')
0 [plasma_glucose_concentration, 183.0000]
1 [Participants, 20-30]
2 [2_hour_serum_insulin, 543.0000]
Name: text, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.