简体   繁体   中英

How to get groups of numbers separated by commas in python?

I have the following text:

Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
       270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
       336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}

And i want to extract by cluster the number in each set.

I have tryed using regex without much luck

I have tried:

regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"

But the groups dont match.

I would like an output as:

["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]

Or maybe this is not the best approach to do this.

Thanks

I would not use regex for this. Your text is within yaml spec and can be loaded directly with an order-preserving yaml loader such as oyaml .

import oyaml as yaml   # pip install oyaml
data = yaml.load(text)

To unpack that dict to the desired "flat" structure, it's just a list comprehension:

[x for (k, v) in data.items() for x in (k, *v)]

Note: I'm the author of oyaml.

You can create a more generic regex:

import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)

Output:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']

See regex in use here

\w+(?: +\w+)?
  • \\w+ Match one or more word characters
  • (?: +\\w+)? Optionally match the following
    • + Match one or more spaces
    • \\w+ Match one or more word characters

See code in use here

import re

s = "Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, \n3, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}"
print(re.findall(r"\w+(?: +\w+)?", s))

Result:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM