[英]Python 3: How to check if a string can be a valid variable?
I have a string and want to check if it can be used as a valid variable without getting a syntax error. 我有一个字符串,想检查它是否可以用作有效变量而不会出现语法错误。 For example 例如
def variableName(string):
#if string is valid variable name:
#return True
#else:
#return False
input >>> variableName("validVariable")
output >>> True
input >>> variableName("992variable")
output >>> False
I would not like to use the .isidentifier(). 我不想使用.isidentifier()。 I want to make a function of my own. 我想发挥自己的作用。
The following answer is true only for "old-style" Python-2.7 identifiers; 以下答案仅适用于“旧式” Python-2.7标识符;
"validVariable".isidentifier()
#True
"992variable".isidentifier()
#False
Since you changed your question after I posted the answer, consider writing a regular expression: 由于您在我发布答案后更改了问题,因此请考虑编写正则表达式:
re.match(r"[_a-z]\w*$", yourstring,flags=re.I)
In Python 3 a valid identifier can have characters outside of ASCII range, as you don't want to use str.isidentifier
, you can write your own version of it in Python. 在Python 3中,有效的标识符可以包含ASCII范围之外的字符,因为您不想使用str.isidentifier
,可以在Python中编写自己的版本。
Its specification can be found here: https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes 它的规范可以在这里找到: https : //www.python.org/dev/peps/pep-3131/#specification-of-language-changes
import keyword
import re
import unicodedata
def is_other_id_start(char):
"""
Item belongs to Other_ID_Start in
http://unicode.org/Public/UNIDATA/PropList.txt
"""
return bool(re.match(r'[\u1885-\u1886\u2118\u212E\u309B-\u309C]', char))
def is_other_id_continue(char):
"""
Item belongs to Other_ID_Continue in
http://unicode.org/Public/UNIDATA/PropList.txt
"""
return bool(re.match(r'[\u00B7\u0387\u1369-\u1371\u19DA]', char))
def is_xid_start(char):
# ID_Start is defined as all characters having one of
# the general categories uppercase letters(Lu), lowercase
# letters(Ll), titlecase letters(Lt), modifier letters(Lm),
# other letters(Lo), letter numbers(Nl), the underscore, and
# characters carrying the Other_ID_Start property. XID_Start
# then closes this set under normalization, by removing all
# characters whose NFKC normalization is not of the form
# ID_Start ID_Continue * anymore.
category = unicodedata.category(char)
return (
category in {'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl'} or
is_other_id_start(char)
)
def is_xid_continue(char):
# ID_Continue is defined as all characters in ID_Start, plus
# nonspacing marks (Mn), spacing combining marks (Mc), decimal
# number (Nd), connector punctuations (Pc), and characters
# carryig the Other_ID_Continue property. Again, XID_Continue
# closes this set under NFKC-normalization; it also adds U+00B7
# to support Catalan.
category = unicodedata.category(char)
return (
is_xid_start(char) or
category in {'Mn', 'Mc', 'Nd', 'Pc'} or
is_other_id_continue(char)
)
def is_valid_identifier(name):
# All identifiers are converted into the normal form NFKC
# while parsing; comparison of identifiers is based on NFKC.
name = unicodedata.normalize(
'NFKC', name
)
# check if it's a keyword
if keyword.iskeyword(name):
return False
# The identifier syntax is <XID_Start> <XID_Continue>*.
if not (is_xid_start(name[0]) or name[0] == '_'):
return False
return all(is_xid_continue(char) for char in name[1:])
if __name__ == '__main__':
# From goo.gl/pvpYg6
assert is_valid_identifier("a") is True
assert is_valid_identifier("Z") is True
assert is_valid_identifier("_") is True
assert is_valid_identifier("b0") is True
assert is_valid_identifier("bc") is True
assert is_valid_identifier("b_") is True
assert is_valid_identifier("µ") is True
assert is_valid_identifier("𝔘𝔫𝔦𝔠𝔬𝔡𝔢") is True
assert is_valid_identifier(" ") is False
assert is_valid_identifier("[") is False
assert is_valid_identifier("©") is False
assert is_valid_identifier("0") is False
You can check CPython and Pypy's implmentation here and here respectively. 您可以在此处和此处分别检查CPython和Pypy的实现。
You could use a regular expression. 您可以使用正则表达式。
For example: 例如:
isValidIdentifier = re.match("[A-Za-z_](0-9A-Za-z_)*",identifier)
Note that his only checks for alphanumeric characters. 请注意,他只检查字母数字字符。 The actual standard supports other characters. 实际标准支持其他字符。 See here: https://www.python.org/dev/peps/pep-3131/ 看到这里: https : //www.python.org/dev/peps/pep-3131/
You may also need to exclude reserved words such as def, True, False, ... see here: https://www.programiz.com/python-programming/keywords-identifier 您可能还需要排除诸如def,True,False等保留字...请参见此处: https : //www.programiz.com/python-programming/keywords-identifier
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.