如何标记字符串（包含有关数学计算和浮点数的数据）？

Question

I'm trying to tokenize a string (which has data about mathematical calculations) and create a list. 我正在尝试标记字符串（包含有关数学计算的数据）并创建一个列表。

for example, 例如，

a = "(3.43 + 2^2 / 4)" 一个=“（3.43 + 2 ^ 2/4）”

function(a) => ['(', '3.43', '+', '2', '^', '2', '/', '4'] function（a）=> ['（'，'3.43'，'+'，'2'，'^'，'2'，'/'，'4']

I don't want to use external imports (like nltk). 我不想使用外部导入（例如nltk）。

The problem I'm facing is keeping the floating point numbers intact. 我面临的问题是保持浮点数不变。

I've been scratching my head for hours and have made 2 functions, but the problem occurs when it confronts floating point numbers. 我已经花了好几个小时摸索着，做了两个功能，但是当它遇到浮点数时就会出现问题。

Here is what I've done: 这是我所做的：

a = "(3.43 + 2^2 / 4)"
tokens = []

for x in range(1, len(a)-1):
no = []

if a[x] == ".":
    y = x
    no.append(".")

    while is_int(a[y-1]):
        no.insert(0, a[y-1])
        y -= 1

    y = x

    while is_int(a[y+1]):
        no.extend(a[y+1])
        y += 1

    token = "".join(no)
    no = []
    tokens.append(token)

else:
    tokens.append(a[x])

print(tokens)

OUTPUT: OUTPUT：

['3', '3.43', '4', '3', ' ', '+', ' ', '2', '^', '2', ' ', '/', ' ', '4']

Answer 1

You could use Python's own tokenizer, which is part of the standard API: 您可以使用Python自己的令牌生成器，它是标准API的一部分：

from tokenize import tokenize
from io import BytesIO

source = "(3.43 + 2^2 / 4)"
tokens = tokenize(BytesIO(source.encode('utf-8')).readline)
non_empty = [t for t in tokens if t.line != '']

for token in non_empty:
    print(token.string)

which will print: 它将打印：

(
3.43
+
2
^
2
/
4
)

More info: https://docs.python.org/3/library/tokenize.html 更多信息： https : //docs.python.org/3/library/tokenize.html

Answer 2

Try this 尝试这个

a = "(3.43 + 2^2 / 4)"
tokens = []
no = ""

for x in range(0, len(a)):
    # Skip spaces
    if a[x] == " ":
        pass
    # Concatenate digits or '.' to number
    elif a[x].isdigit() or (a[x] == "."):
        no += a[x]
    # Other token: append number if any, and then token
    else:
        if no != "":
            tokens.append(no)
        tokens.append(a[x])
        no = ""

print(tokens)

Output: 输出：

['(', '3.43', '+', '2', '^', '2', '/', '4', ')']

Note, this won't handle operators that are more than one character, such as ==, **, += 请注意，这不会处理多个字符的运算符，例如==，**，+ =

Answer 3

Notice also that your code won't work when you have more than 1 digits numbers in your expression. 还要注意，当表达式中的数字多于1位时，代码将不起作用。 But you can try this: 但是您可以尝试以下操作：

a = "(3.43 + 22^222 / 4)"
list_a = a[1:-1].split()  # remove first and last paranthesis and split by expression
tokens = []
for val in list_a:
    if '.' in val:  # if number has '.' then convert it to float number.
        tokens.append(float(val))
    elif val.isdigit():  # if it is number then add it to tokens
        tokens.append(val)
    elif len(val)==1:  # if it is math expression then add it to tokens
        tokens.append(val)
    else:  # problem is when you have an expression like: "2^2" - we have to go char by char
        no = []
        for k in val:
            if k.isdigit():
                no.append(k)
            else:
                tokens.append(''.join(no))
                tokens.append(k)
                no = []
        tokens.append(''.join(no))

print(tokens)

如何标记字符串（包含有关数学计算和浮点数的数据）？

问题描述

3 个解决方案

解决方案1
2 2019-05-14 11:58:33

解决方案2
1 已采纳 2019-05-14 11:36:28

解决方案3
1 2019-05-14 11:42:40

如何标记字符串（包含有关数学计算和浮点数的数据）？

问题描述

3 个解决方案

解决方案1 2 2019-05-14 11:58:33

解决方案2 1 已采纳 2019-05-14 11:36:28

解决方案3 1 2019-05-14 11:42:40

解决方案1
2 2019-05-14 11:58:33

解决方案2
1 已采纳 2019-05-14 11:36:28

解决方案3
1 2019-05-14 11:42:40