简体   繁体   English

Python:获取字符串索引的行号和列号?

[英]Python: Get the line and column number of string index?

Say I have a text file I'm operating on.假设我有一个正在操作的文本文件。 Something like this (hopefully this isn't too unreadable):像这样的东西(希望这不是太难以理解):

data_raw = open('my_data_file.dat').read()
matches = re.findall(my_regex, data_raw, re.MULTILINE)
for match in matches:
    try:
        parse(data_raw, from_=match.start(), to=match.end())
    except Exception:
        print("Error parsing data starting on line {}".format(what_do_i_put_here))
        raise

Notice in the exception handler there's a certain variable named what_do_i_put_here .请注意,在异常处理程序中有一个名为what_do_i_put_here的变量。 My question is: how can I assign to that name so that my script will print the line number that contains the start of the 'bad region' I'm trying to work with?我的问题是:如何分配给该名称,以便我的脚本将打印包含我正在尝试使用的“坏区域”开头的行号 I don't mind re-reading the file, I just don't know what I'd do...我不介意重新阅读文件,我只是不知道我会做什么......

I wrote this.我写了这个。 It's untested and inefficient but it does help my exception message be a little clearer:它未经测试且效率低下,但确实有助于我的异常消息更加清晰:

def coords_of_str_index(string, index):
    """Get (line_number, col) of `index` in `string`."""
    lines = string.splitlines(True)
    curr_pos = 0
    for linenum, line in enumerate(lines):
        if curr_pos + len(line) > index:
            return linenum + 1, index-curr_pos
        curr_pos += len(line)

I haven't even tested to see if the column number is vaguely accurate.我什至没有测试过列号是否准确。 I failed to abide by YAGNI我没有遵守YAGNI

Here's something a bit cleaner, and in my opinion easier to understand than your own answer:这里有一些更清晰的东西,在我看来,比你自己的答案更容易理解:

def index_to_coordinates(s, index):
    """Returns (line_number, col) of `index` in `s`."""
    if not len(s):
        return 1, 1
    sp = s[:index+1].splitlines(keepends=True)
    return len(sp), len(sp[-1])

It works essentially the same way as your own answer, but by utilizing string slicing splitlines() actually calculates all the information you need for you without the need for any post processing.它的工作方式与您自己的答案基本相同,但是通过使用字符串切片splitlines()实际上可以计算您需要的所有信息,而无需任何后期处理。

Using the keepends=True is necessary to give correct column counts for end of line characters.必须使用keepends=True来为行尾字符提供正确的列数。

The only extra problem is the edge case of an empty string, which can easily be handled by a guard-clause.唯一的额外问题是空字符串的边缘情况,可以很容易地由保护子句处理。

I tested it in Python 3.8, but it probably works correctly after about version 3.4 (in some older versions len() counts code units instead of code points, and I assume it would break for any string containing characters outside of the BMP)我在 Python 3.8 中对其进行了测试,但它可能在大约 3.4 版之后正常工作(在某些旧版本中len()计算代码单元而不是代码点,我认为它会因包含 BMP 之外的字符的任何字符串而中断)

Column indexing starts with 0 so you need to extract 1 from len(sp[-1]) at the very end of your code to get the correct column value.列索引从 0 开始,因此您需要在代码的最后从 len(sp[-1]) 中提取 1 以获得正确的列值。 Also, I'd perhaps return None (instead of "1.1" - which is also incorrect since it should be "1.0"...) if the lenght of string is 0 or if the string is too short to fit the index.此外,如果字符串的长度为 0 或字符串太短而无法适应索引,我可能会返回 None (而不是“1.1” - 这也是不正确的,因为它应该是“1.0”......)。 Otherwise, it's an excellent and elegant solution Tim.否则,这是一个出色而优雅的解决方案Tim。

def index_to_coordinates(txt:str, index:int) -> str:
    """Returns 'line.column' of index in 'txt'."""
    if not txt or len(txt)-1 < index:
        return None
    sp = txt[:index+1].splitlines(keepends=True)
    return (f"{len(sp)}.{len(sp[-1])-1}")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM