[英]Python Tokenizer: Word Limit
我用python為我的語言寫了一個標記器,但是當我嘗試標記一個文件時,它只會將其標記成一個極限。 當它應該標記所有文件時,它僅標記大約90個標記(分別為單詞和符號)。 這是代碼:
import re
file = input("filename>")
with open(file, 'r') as myfile:
data=myfile.read().replace('\n', '')
scanner = re.Scanner([
(r"[0-9]+", lambda scanner,token:("NUMBER", token)),
(r"[a-z_A-Z_λ]+", lambda scanner,token:("KEYWORD", token)),
(r"[,.!#%^*()']+", lambda scanner,token:("OPERATOR", token)),
(r'["]+', lambda scanner,token:("OPERATOR", token)),
(r"[+-]+", lambda scanner,token:("OPERATOR", token)),
(r'[=]+', lambda scanner,token:("OPERATOR", token)),
(r"[{}]+", lambda scanner,token:("OPERATOR", token)),
(r'[[]]+', lambda scanner,token:("OPERATOR", token)),
(r"\s+", None), # None == skip token.
])
results, remainder = scanner.scan(data)
print(results)
示例腳本是(對於那些有時間閱讀的人):
constant Flow = "Flow"
constant script = this
local names = {'Gabriel', 'Kauan', 'Laura', 'Tarsila'}
constant flowCountry = 'Brasil'
local void function getinpairs(name) extends findArg()
for _, v(name) in pairs(names) do
private local table = names
print("Flow being the best programming language for you, has implemented some new arguments!")
local flowFounder = names[1]
local namesMetatable = getmetatable(t1)
end
end
function findArg(name)
return getinpairs(name)
end
findArg('Gabriel')
結果是(對於那些有時間閱讀的人):
[('KEYWORD', 'constant'), ('KEYWORD', 'Flow'), ('OPERATOR', '='), ('OPERATOR', '"'), ('KEYWORD', 'Flow'), ('OPERATOR', '"'), ('KEYWORD', 'constant'), ('KEYWORD', 'script'), ('OPERATOR', '='), ('KEYWORD', 'thislocal'), ('KEYWORD', 'names'), ('OPERATOR', '='), ('OPERATOR', '{'), ('OPERATOR', "'"), ('KEYWORD', 'Gabriel'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Kauan'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Laura'), ('OPERATOR', "',"), ('OPERATOR', "'"), ('KEYWORD', 'Tarsila'), ('OPERATOR', "'"), ('OPERATOR', '}'), ('KEYWORD', 'constant'), ('KEYWORD', 'flowCountry'), ('OPERATOR', '='), ('OPERATOR', "'"), ('KEYWORD', 'Brasil'), ('OPERATOR', "'"), ('KEYWORD', 'local'), ('KEYWORD', 'void'), ('KEYWORD', 'function'), ('KEYWORD', 'getinpairs'), ('OPERATOR', '('), ('KEYWORD', 'name'), ('OPERATOR', ')'), ('KEYWORD', 'extends'), ('KEYWORD', 'findArg'), ('OPERATOR', '()'), ('KEYWORD', 'for'), ('KEYWORD', '_'), ('OPERATOR', ','), ('KEYWORD', 'v'), ('OPERATOR', '('), ('KEYWORD', 'name'), ('OPERATOR', ')'), ('KEYWORD', 'in'), ('KEYWORD', 'pairs'), ('OPERATOR', '('), ('KEYWORD', 'names'), ('OPERATOR', ')'), ('KEYWORD', 'do'), ('KEYWORD', 'private'), ('KEYWORD', 'local'), ('KEYWORD', 'table'), ('OPERATOR', '='), ('KEYWORD', 'names'), ('KEYWORD', 'print'), ('OPERATOR', '('), ('OPERATOR', '"'), ('KEYWORD', 'Flow'), ('KEYWORD', 'being'), ('KEYWORD', 'the'), ('KEYWORD', 'best'), ('KEYWORD', 'programming'), ('KEYWORD', 'language'), ('KEYWORD', 'for'), ('KEYWORD', 'you'), ('OPERATOR', ','), ('KEYWORD', 'has'), ('KEYWORD', 'implemented'), ('KEYWORD', 'some'), ('KEYWORD', 'new'), ('KEYWORD', 'arguments'), ('OPERATOR', '!'), ('OPERATOR', '"'), ('OPERATOR', ')'), ('KEYWORD', 'local'), ('KEYWORD', 'flowFounder'), ('OPERATOR', '='), ('KEYWORD', 'names')]
它停在第11行:單詞“ name”。
有人可以指出腳本上的錯誤嗎?
您需要轉義括號:
r'[[]]+'
應該
r'[\[\]]+'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.