简体   繁体   English

在Python中使用行和列作为参数在.txt文件上的正确位置上打印单词的最佳方法

[英]Best way to print word on .txt file on right position using line and column as parameter in Python

I currently got a program in Python that reads a text file but it loses its formatting while staying on memory for a couple of reasons, but it keeps as information the line and column of it. 我目前在Python中有一个程序可以读取文本文件,但是由于几个原因,它在保留在内存中时会丢失其格式,但是它会将行和列作为信息保留。 I would be interested on using this line and column information to reproduce the file as it was originally read. 我将对使用此行和列信息来重现最初读取的文件感兴趣。 It is ok if the column doesn't match in amount of spaces or tabs in comparison to the original as long it is consistent thru the new file. 如果该列在空格或制表符的数量与原始数量不匹配的情况下是可以的,只要它在新文件中是一致的即可。

One first naive solution that occurred to me was to always keep some pointer to line 1 and column 1 and spam \\n and white spaces using the line and column information, but I was wondering if there is a better way to do that in Python (in fact I don't know how to do this pointer to first line and column either). 我想到的第一个天真的解决方案是,始终使用行和列信息始终保持一些指向行1和列1以及垃圾邮件\\nwhite spaces指针,但是我想知道在Python中是否有更好的方法(实际上,我也不知道如何将此指针指向第一行和第一列)。

Some method that would take as parameters a string, the line, column, and the file as four parameters in Python and would maybe be a possible solution, although I am unsure in this case what would occur if (line,column) is occupied (this would never occur in my situation so is not really a concern). 某些方法将Python中的字符串,行,列和文件作为四个参数作为参数,并且可能是一种可能的解决方案,尽管我不确定在这种情况下如果(line,column)被占用会发生什么(就我的情况而言,这永远不会发生,因此不必担心)。

Edit: The information is stored on a complicated 'structure', but it suffices to say that I can extract such information as a list of strings, where each string has an associated line and column information. 编辑:信息存储在一个复杂的“结构”上,但是可以说我可以提取诸如字符串列表之类的信息,其中每个字符串都有相关的行和列信息。 I would then use this 'method' to take each string and its column and line to add to the file on the right position. 然后,我将使用此“方法”将每个字符串及其列和行添加到正确位置的文件中。

Edit 2: The only assumption is that when getting every word from the original file they will happen on exactly the same order. 编辑2:唯一的假设是,当从原始文件中获取每个单词时,它们将以完全相同的顺序发生。 That is to say, if the original file is "The cat jumped \\n but did not die" then it is expected that I will be taken the strings: ' the', 'cat', 'jumped', 'and', 'didn't', 'die' and its associated line and columns. 也就是说,如果原始文件是“猫跳了\\ n但没有死”,则可以预期我将使用以下字符串:“ the”,“ cat”,“ jumped”,“ and”,“没有”,“死亡”及其相关的行和列。 In that case, 'but', 'did', 'not' and 'die' will have line 2 instead of 1 and all words their associated columns (which may or may not overlap since its a different line). 在这种情况下,“ but”,“ did”,“ not”和“ die”将具有第2行而不是第1行,并且所有单词均与它们相关的列(由于其不同的行而可能重叠或不重叠)。

Thank you. 谢谢。

You would need to order the lines in memory based on row number (y). 您将需要根据行号(y)对内存中的行进行排序。 Then for i in range (1..N), with N = number of rows per page in your original file, you would: 然后对于范围为(1..N)的i,其中N =原始文件中每页的行数,您将:

- if there are rows with that y:
    - sort all rows with that y in that page using their x
    - start with j = 0, and for each text chunk:
       - write (x - j) spaces
       - write the chunk
       - set j equal to x plus the length of the chunk
- output a carriage return and continue

This would rebuild an acceptable version of the text. 这将重建文本的可接受版本。 A slight modification with modulo 8 could even allow you to replace some of those xj spaces with tabs. 模8的微小修改甚至可以使您用制表符替换某些xj空间。

Not sure if it's efficient and I'm sure it needs some work. 不知道它是否有效,我确定它需要一些工作。 I've used the cat example to mock up supporting data, then put that back as text... There's no error checking, but I think this is the bare bones of it... 我已经使用cat示例来模拟支持数据,然后将其放回文本中……没有错误检查,但是我认为这是它的基本内容……

import re
from operator import itemgetter

test = "The cat jumped \n but did not die"
lines = test.splitlines()
line_ref = []
for line in lines:
    words = list(re.finditer(r'(\S+)', line))
    line_ref.append((len(line), dict( (m.span(), m.group()) for m in words) ))


output = []
for line in line_ref:
    last = max(line[1], key=itemgetter(1))[1]
    textlist = [' '] * max(last, line[0])
    for (start, end), word in line[1].iteritems():
        textlist[start:end] = word
    output.append(''.join(textlist))

print '\n'.join(output)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM