简体   繁体   English

如何在python中用逗号分隔数字组?

[英]How to get groups of numbers separated by commas in python?

I have the following text: 我有以下文字:

Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
       270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
       336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}

And i want to extract by cluster the number in each set. 我想通过群集提取每组中的数字。

I have tryed using regex without much luck 我尝试使用正则表达式没有太多运气

I have tried: 我努力了:

regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"

But the groups dont match. 但这些团体并不匹配。

I would like an output as: 我希望输出为:

["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]

Or maybe this is not the best approach to do this. 或者这可能不是最好的方法。

Thanks 谢谢

I would not use regex for this. 我不会使用正则表达式。 Your text is within yaml spec and can be loaded directly with an order-preserving yaml loader such as oyaml . 您的文本在yaml规范内,可以直接使用保留订单的yaml加载程序(如oyaml)加载

import oyaml as yaml   # pip install oyaml
data = yaml.load(text)

To unpack that dict to the desired "flat" structure, it's just a list comprehension: 要将该dict解压缩到所需的“平面”结构,它只是一个列表理解:

[x for (k, v) in data.items() for x in (k, *v)]

Note: I'm the author of oyaml. 注意:我是oyaml的作者。

You can create a more generic regex: 您可以创建更通用的正则表达式:

import re
s = '\nCluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}\n'
data = re.findall('Cluster \d+|\d+', s)

Output: 输出:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']

See regex in use here 请参阅此处使用的正则表达式

\w+(?: +\w+)?
  • \\w+ Match one or more word characters \\w+匹配一个或多个单词字符
  • (?: +\\w+)? Optionally match the following 可选择匹配以下内容
    • + Match one or more spaces +匹配一个或多个空格
    • \\w+ Match one or more word characters \\w+匹配一个或多个单词字符

See code in use here 请参阅此处使用的代码

import re

s = "Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,\n       69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,\n       202, 212, 220, 227, 250, 252, 253, 259, 262, 267,\n       270, 282, 296, 318, 319, 323, 326, 341}\nCluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,\n       221, 279, 284, 285, 286, 287, 327, 333, 334, 335,\n       336}\nCluster 9: {3, 64, \n3, 93, 150, 153, 264, 269, 320, 321, 322}\nCluster 10: {94, 123, 147}"
print(re.findall(r"\w+(?: +\w+)?", s))

Result: 结果:

['Cluster 7', '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', 'Cluster 8', '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', 'Cluster 9', '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', 'Cluster 10', '94', '123', '147']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式特定格式,数字用逗号分隔(在 Python 中) - Regex specific format with numbers separated with commas (in Python) 如何使用 Python 在 excel 文件中正确解析为由混合逗号和点分隔的文本数字? - How to correctly parse as text numbers separated by mixed commas and dots in excel file using Python? 从包含以逗号分隔的数字的字符串创建列表; Python 3 - Creating a list from a string that contains numbers separated by commas; Python 3 如何在python中用逗号格式化数字 - how to format numbers with commas in python 如何在Python中使用xlrd从单个单元格中获得多个值,并用逗号分隔多个值? - How do I get separate values from a single cell with multiple values in it separated by commas using xlrd in Python? 如何使用python获取列表中单个字符串的值而不用在csv文件中用逗号分隔? - How to get values with in a single string in a list without getting separated by commas in csv file using python? 在 Python 中,从列表中的数字中删除数千个逗号,其中数字用逗号分隔 - In Python, removing thousands comma from numbers in a list where the numbers are separated by commas 如何以逗号和空格分隔的列表作为参数运行 Python 脚本? - How to run Python script with list separated by commas and spaces as argument? 如何使用 python 脚本在 excel 电子表格中删除空格/用逗号分隔? - how to remove spaces/separated by commas in excel spreadsheet using python script? Python-如何读取用逗号分隔的csv文件,值中包含逗号? - Python - How to read a csv files separated by commas which have commas within the values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM