简体   繁体   English

str.startswith()无法正常工作

[英]str.startswith() not working as I intended

I'm trying to test for a /t or a space character and I can't understand why this bit of code won't work. 我正在尝试测试/ t或空格字符,但我不明白为什么这部分代码不起作用。 What I am doing is reading in a file, counting the loc for the file, and then recording the names of each function present within the file along with their individual lines of code. 我正在做的是读取文件,计算文件的位置,然后记录文件中存在的每个函数的名称以及它们各自的代码行。 The bit of code below is where I attempt to count the loc for the functions. 下面的代码位是我尝试计算功能位置的位置。

import re

...
    else:
            loc += 1
            for line in infile:
                line_t = line.lstrip()
                if len(line_t) > 0 \
                and not line_t.startswith('#') \
                and not line_t.startswith('"""'):
                    if not line.startswith('\s'):
                        print ('line = ' + repr(line))
                        loc += 1
                        return (loc, name)
                    else:
                        loc += 1
                elif line_t.startswith('"""'):
                    while True:
                        if line_t.rstrip().endswith('"""'):
                            break
                        line_t = infile.readline().rstrip()

            return(loc,name)

Output: 输出:

Enter the file name: test.txt
line = '\tloc = 0\n'

There were 19 lines of code in "test.txt"

Function names:

    count_loc -- 2 lines of code

As you can see, my test print for the line shows a /t, but the if statement explicitly says (or so I thought) that it should only execute with no whitespace characters present. 如您所见,我的测试打印行显示为/ t,但是if语句明确指出(或因此,我认为)应该只在不存在空格字符的情况下执行。

Here is my full test file I have been using: 这是我一直在使用的完整测试文件:

def count_loc(infile):
    """ Receives a file and then returns the amount
        of actual lines of code by not counting commented
        or blank lines """

    loc = 0
    for line in infile:
        line = line.strip()
        if len(line) > 0 \
        and not line.startswith('//') \
        and not line.startswith('/*'):
            loc += 1
            func_loc, func_name = checkForFunction(line);
        elif line.startswith('/*'):
            while True:
                if line.endswith('*/'):
                    break
                line = infile.readline().rstrip()

    return loc

 if __name__ == "__main__":
    print ("Hi")
    Function LOC = 15
    File LOC = 19

\\s is only whitespace to the re package when doing pattern matching. 在进行模式匹配时, \\s只是re包的空白。

For startswith , an ordinary method of ordinary strings, \\s is nothing special. 对于startswith ,普通字符串的普通方法, \\s没什么特别的。 Not a pattern, just characters. 不是图案,只是字符。

Your question has already been answered and this is slightly off-topic, but... 您的问题已经得到回答,这有点不合时宜,但是...

If you want to parse code, it is often easier and less error-prone to use a parser. 如果要解析代码,则使用解析器通常更容易且更不易出错。 If your code is Python code, Python comes with a couple of parsers ( tokenize , ast , parser ). 如果您的代码是Python代码,则Python带有几个解析器( tokenizeastparser )。 For other languages, you can find a lot of parsers on the internet. 对于其他语言,您可以在Internet上找到很多解析器。 ANTRL is a well-known one with Python bindings . ANTRL是使用Python 绑定的著名工具。

As an example, the following couple of lines of code print all lines of a Python module that are not comments and not doc-strings: 举例来说,以下几行代码显示了Python模块中不是注释且不是文档字符串的所有行:

import tokenize

ignored_tokens = [tokenize.NEWLINE,tokenize.COMMENT,tokenize.N_TOKENS
                 ,tokenize.STRING,tokenize.ENDMARKER,tokenize.INDENT
                 ,tokenize.DEDENT,tokenize.NL]
with open('test.py', 'r') as f:
    g = tokenize.generate_tokens(f.readline)
    line_num = 0
    for a_token in g:
        if a_token[2][0] != line_num and a_token[0] not in ignored_tokens:
            line_num = a_token[2][0]
            print(a_token)

As a_token above is already parsed, you can easily check for function definition, too. 由于上面的a_token已经被解析,因此您也可以轻松检查函数定义。 You can also keep track where the function ends by looking at the current column start a_token[2][1] . 您还可以通过查看当前列start a_token[2][1]跟踪函数结束的位置。 If you want to do more complex things, you should use ast. 如果您想做更复杂的事情,则应该使用ast。

You string literals aren't what you think they are. 您输入的字符串文字不是您想的那样。 You can specify a space or TAB like so: 您可以像这样指定空格或TAB:

space = ' '
tab = '\t'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM