简体   繁体   English

索引/偏移量性能很糟糕-我在做什么错? 蟒蛇

[英]Index/Offset performance is terrible - what am I doing wrong? Python

I need to assemble a long text string from xml fields. 我需要从xml字段中组装一个长文本字符串。

XML_FIELD_ONE = "Iamacatthatisoddlyimmunetocatnip" XML_FIELD_ONE =“ Iamacatthatisoddlyimmunetocatnip”

XML_FILED_TWO = [7,8,24] XML_FILED_TWO = [7,8,24]

FILED_TWO contains the index at which to insert either \\n or \\r. FILED_TWO包含要在其中插入\\ n或\\ r的索引。 If two indexes are 1 apart (like 7, 8), then I need to insert \\r\\n. 如果两个索引相隔1(例如7、8),那么我需要插入\\ r \\ n。 If the index is solo (like 24) I need to insert \\n. 如果索引是单独的(例如24),则需要插入\\ n。

It takes about 2 minutes to process a 25K line file with this code. 使用此代码处理25K行文件大约需要2分钟。 What am I doing wrong? 我究竟做错了什么?

XML_FIELD_ONE = list("Iamacatthatisoddlyimmunetocatnip")
XML_FILED_TWO = [7,8,24]

idx = 0
while idx <= len(XML_FIELD_ONE):
   for position in XML_FIELD_ONE:
       for space in XML_FIELD_TWO:

             if idx == int(space) and idx+1 == int(space)+1:
               XML_FIELD_ONE[idx] = "\r"

                        try:
                            XML_FIELD_ONE[idx+1] = "\n"
                        except:
                            pass

              elif idx == int(space):
                 XML_FIELD_ONE[idx] = "\n"

    idx += 1


new_text = "".join(XML_FIELD_ONE)
return new_text

The simple way of doing this is: 这样做的简单方法是:

for offset in XML_FILED_TWO:
    XML_FILED_ONE[offset] = \n

But this violates the "if two offsets are together, first one is \\r, next one is \\n" 但这违反了“如果两个偏移量在一起,则第一个是\\ r,下一个是\\ n”

You wrote a triple loop when you need only one; 您只需要一个循环时就编写了一个三重循环。 this is horridly inefficient. 这是非常低效的。 You know exactly where to insert the new items: go directly there, instead of incrementing two counters to find the place. 您确切知道要在哪里插入新项目:直接去那里,而不是增加两个计数器来找到位置。

I'm not sure exactly where you need the insertions, but this should be close. 我不确定您需要插入的确切位置,但这应该很近。 To keep the original indices correct, you need to insert from the right end and work to the left; 为了保持原始索引正确,您需要从右端插入并向左移动; that's why I reverse XML_FIELD_TWO. 这就是为什么我反转XML_FIELD_TWO。

I left in my debugging print statements. 我离开了调试打印语句。

XML_FIELD_ONE = list("Iamacatthatisoddlyimmunetocatnip")
XML_FIELD_TWO = [7,8,24]

print XML_FIELD_ONE
XML_FIELD_TWO = XML_FIELD_TWO[::-1]
print XML_FIELD_TWO
i = 0
while i < len(XML_FIELD_TWO):
    print i, XML_FIELD_TWO[i]
    if XML_FIELD_TWO[i] - XML_FIELD_TWO[i+1] == 1:
        XML_FIELD_ONE.insert(XML_FIELD_TWO[i], '\r\n')
        i += 2
    else:
        XML_FIELD_ONE.insert(XML_FIELD_TWO[i], '\n')
        i += 1

    print "\n", ''.join(XML_FIELD_ONE)

Output: 输出:

['I', 'a', 'm', 'a', 'c', 'a', 't', 't', 'h', 'a', 't', 'i', 's', 'o', 'd', 'd', 'l', 'y', 'i', 'm', 'm', 'u', 'n', 'e', 't', 'o', 'c', 'a', 't', 'n', 'i', 'p']
[24, 8, 7]
0 24

Iamacatthatisoddlyimmune
tocatnip
1 8

Iamacatt
hatisoddlyimmune
tocatnip

You can use the python enumerate() function to produce a sequence of index/value pairs in a single loop. 您可以使用python enumerate()函数在单个循环中生成一系列索引/值对。 You can then use the list.insert(pos, val) method to insert the character you want. 然后,您可以使用list.insert(pos, val)方法插入所需的字符。

XML_FIELD_ONE = list("Iamacatthatisoddlyimmunetocatnip")
XML_FIELD_TWO = [7,8,24]

last_i = len(XML_FIELD_TWO) - 1

for i,p in enumerate(XML_FIELD_TWO):
    ch = '\r' if i < last_i and XML_FIELD_TWO[i+1] == p+1 else '\n'
    XML_FIELD_ONE.insert(p, ch)

print(XML_FIELD_ONE)

Here is a linear algorithm to achieve what you are trying to do. 这是一个线性算法,可以实现您要执行的操作。 Actually, using try-except is perfectly appropriate here, but you should never have a naked except : 实际上,在这里使用try-except非常合适,但是除了

>>> XML_FIELD_ONE = list("Iamacatthatisoddlyimmunetocatnip")
>>> XML_FIELD_TWO = [7,8,24]
>>> insertions = 0
>>> for i, e in enumerate(XML_FIELD_TWO):
...     try:
...         cont = e + 1 == XML_FILED_TWO[i+1]
...     except IndexError:
...         cont = False
...     if cont:
...         XML_FI.insert(e+1+insertions, '\r\n')
...         XML_FIELD_ONE.insert(e+1+insertions, '\r\n')
...     else:
...         XML_FIELD_ONE.insert(e + insertions, '\n')
...     insertions += 1
...
>>> print("".join(XML_FIELD_ONE))
Iamacatt

hatisoddlyimmune
tocatnip
>>>

I keep track of the number of insertions which offsets the index used in .insert to keep the original indices correct. 我跟踪插入数量,该数量会偏移.insert使用的索引,以保持原始索引正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM