[英]Regex specific format with numbers separated with commas (in Python)
[英]How to get groups of numbers separated by commas in python?
我有以下文字:
Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}
我想通過群集提取每組中的數字。
我嘗試使用正則表達式沒有太多運氣
我努力了:
regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"
但這些團體並不匹配。
我希望輸出為:
["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]
或者這可能不是最好的方法。
謝謝
我不會使用正則表達式。 您的文本在yaml
規范內,可以直接使用保留訂單的yaml加載程序(如oyaml)加載 。
import oyaml as yaml # pip install oyaml
data = yaml.load(text)
要將該dict解壓縮到所需的“平面”結構,它只是一個列表理解:
[x for (k, v) in data.items() for x in (k, *v)]
注意:我是oyaml的作者。
您可以創建更通用的正則表達式:
import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n 69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n 202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n 270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n 221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n 336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)
輸出:
['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
\w+(?: +\w+)?
\\w+
匹配一個或多個單詞字符 (?: +\\w+)?
可選擇匹配以下內容
+
匹配一個或多個空格 \\w+
匹配一個或多個單詞字符 import re
s = "Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n 69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n 202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n 270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n 221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n 336}\nCluster 9: {3, 64, \n3, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}"
print(re.findall(r"\w+(?: +\w+)?", s))
結果:
['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.