根据空间和标点符号化，保留标点符号

Question

我正在寻找一种解决方案，以根据空格或标点符号化或拆分。 仅标点必须保留在结果中。 它将用于识别语言（python，java，html，c ...）

输入string可以是：

class Foldermanagement():
def __init__(self):
    self.today = invoicemng.gettoday()
    ...

我期望的输出是列表/标记，如下所述：

['class', 'Foldermanagement', '(', ')', ':', 'def', '_', '_', 'init', ... ,'self', '.', 'today', '=', ...]

欢迎任何解决方案，谢谢。

Answer 1

我认为这是您要寻找的东西：

import string, re, itertools
text = """
class Foldermanagement():
def __init__(self):
    self.today = invoicemng.gettoday()
       """
separators = string.punctuation + string.whitespace
separators_re = "|".join(re.escape(x) for x in separators)
tokens = zip(re.split(separators_re, text), re.findall(separators_re, text))
flattened = itertools.chain.from_iterable(tokens)
cleaned = [x for x in flattened if x and not x.isspace()]
# ['class', 'Foldermanagement', '(', ')', ':', 'def', '_', '_',
#  'init', '_', '_', '(', 'self', ')', ':', 'self', '.', 'today', '=', 
#  'invoicemng', '.', 'gettoday', '(', ')']

根据空间和标点符号化，保留标点符号

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-01-25 00:24:47

根据空间和标点符号化，保留标点符号

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-01-25 00:24:47

解决方案1
3 已采纳 2018-01-25 00:24:47