簡體   English   中英

Python Tokenizer:字數限制

[英]Python Tokenizer: Word Limit

我用python為我的語言寫了一個標記器,但是當我嘗試標記一個文件時,它只會將其標記成一個極限。 當它應該標記所有文件時,它僅標記大約90個標記(分別為單詞和符號)。 這是代碼:

import re

file = input("filename>")

with open(file, 'r') as myfile:
data=myfile.read().replace('\n', '')


scanner = re.Scanner([
 (r"[0-9]+",                   lambda scanner,token:("NUMBER", token)),
 (r"[a-z_A-Z_λ]+",             lambda scanner,token:("KEYWORD", token)),
 (r"[,.!#%^*()']+",            lambda scanner,token:("OPERATOR", token)),
 (r'["]+',                     lambda scanner,token:("OPERATOR", token)),
 (r"[+-]+",                    lambda scanner,token:("OPERATOR", token)),
 (r'[=]+',                     lambda scanner,token:("OPERATOR", token)),
 (r"[{}]+",                    lambda scanner,token:("OPERATOR", token)),
 (r'[[]]+',                    lambda scanner,token:("OPERATOR", token)),
 (r"\s+", None), # None == skip token.
])

results, remainder = scanner.scan(data)

print(results)

示例腳本是(對於那些有時間閱讀的人):

constant Flow = "Flow"
constant script = this

local names = {'Gabriel', 'Kauan', 'Laura', 'Tarsila'}
constant flowCountry = 'Brasil'

local void function getinpairs(name) extends findArg()
    for _, v(name) in pairs(names) do
       private local table = names
       print("Flow being the best programming language for you, has implemented     some new arguments!")
       local flowFounder = names[1]
       local namesMetatable = getmetatable(t1)
  end
end

function findArg(name)
    return getinpairs(name)
end

findArg('Gabriel')

結果是(對於那些有時間閱讀的人):

[('KEYWORD', 'constant'), ('KEYWORD', 'Flow'), ('OPERATOR', '='), ('OPERATOR', '"'), ('KEYWORD', 'Flow'), ('OPERATOR', '"'), ('KEYWORD', 'constant'), ('KEYWORD', 'script'), ('OPERATOR', '='), ('KEYWORD', 'thislocal'), ('KEYWORD', 'names'), ('OPERATOR', '='), ('OPERATOR', '{'), ('OPERATOR', "'"), ('KEYWORD', 'Gabriel'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Kauan'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Laura'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Tarsila'), ('OPERATOR', "'"), ('OPERATOR', '}'), ('KEYWORD', 'constant'), ('KEYWORD', 'flowCountry'), ('OPERATOR', '='), ('OPERATOR', "'"), ('KEYWORD', 'Brasil'), ('OPERATOR', "'"), ('KEYWORD', 'local'), ('KEYWORD', 'void'), ('KEYWORD', 'function'), ('KEYWORD', 'getinpairs'), ('OPERATOR', '('), ('KEYWORD', 'name'), ('OPERATOR', ')'), ('KEYWORD', 'extends'), ('KEYWORD', 'findArg'), ('OPERATOR', '()'), ('KEYWORD', 'for'), ('KEYWORD', '_'), ('OPERATOR', ','), ('KEYWORD', 'v'), ('OPERATOR', '('), ('KEYWORD', 'name'), ('OPERATOR', ')'), ('KEYWORD', 'in'), ('KEYWORD', 'pairs'), ('OPERATOR', '('), ('KEYWORD', 'names'), ('OPERATOR', ')'), ('KEYWORD', 'do'), ('KEYWORD', 'private'), ('KEYWORD', 'local'), ('KEYWORD', 'table'), ('OPERATOR', '='), ('KEYWORD', 'names'), ('KEYWORD', 'print'), ('OPERATOR', '('), ('OPERATOR', '"'), ('KEYWORD', 'Flow'), ('KEYWORD', 'being'), ('KEYWORD', 'the'), ('KEYWORD', 'best'), ('KEYWORD', 'programming'), ('KEYWORD', 'language'), ('KEYWORD', 'for'), ('KEYWORD', 'you'), ('OPERATOR', ','), ('KEYWORD', 'has'), ('KEYWORD', 'implemented'), ('KEYWORD', 'some'), ('KEYWORD', 'new'), ('KEYWORD', 'arguments'), ('OPERATOR', '!'), ('OPERATOR', '"'), ('OPERATOR', ')'), ('KEYWORD', 'local'), ('KEYWORD', 'flowFounder'), ('OPERATOR', '='), ('KEYWORD', 'names')]

它停在第11行:單詞“ name”。

有人可以指出腳本上的錯誤嗎?

您需要轉義括號:

r'[[]]+'

應該

r'[\[\]]+'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM