[英]How to get groups of numbers separated by commas in python?
I have the following text: 我有以下文字:
Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}
And i want to extract by cluster the number in each set. 我想通过群集提取每组中的数字。
I have tryed using regex without much luck 我尝试使用正则表达式没有太多运气
I have tried: 我努力了:
regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"
But the groups dont match. 但这些团体并不匹配。
I would like an output as: 我希望输出为:
["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]
Or maybe this is not the best approach to do this. 或者这可能不是最好的方法。
Thanks 谢谢
I would not use regex for this. 我不会使用正则表达式。 Your text is within yaml
spec and can be loaded directly with an order-preserving yaml loader such as oyaml . 您的文本在yaml
规范内,可以直接使用保留订单的yaml加载程序(如oyaml)加载 。
import oyaml as yaml # pip install oyaml
data = yaml.load(text)
To unpack that dict to the desired "flat" structure, it's just a list comprehension: 要将该dict解压缩到所需的“平面”结构,它只是一个列表理解:
[x for (k, v) in data.items() for x in (k, *v)]
Note: I'm the author of oyaml. 注意:我是oyaml的作者。
You can create a more generic regex: 您可以创建更通用的正则表达式:
import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n 69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n 202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n 270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n 221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n 336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)
Output: 输出:
['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
See regex in use here 请参阅此处使用的正则表达式
\w+(?: +\w+)?
\\w+
Match one or more word characters \\w+
匹配一个或多个单词字符 (?: +\\w+)?
Optionally match the following 可选择匹配以下内容
+
Match one or more spaces +
匹配一个或多个空格 \\w+
Match one or more word characters \\w+
匹配一个或多个单词字符 See code in use here 请参阅此处使用的代码
import re
s = "Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n 69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n 202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n 270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n 221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n 336}\nCluster 9: {3, 64, \n3, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}"
print(re.findall(r"\w+(?: +\w+)?", s))
Result: 结果:
['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.