简体   繁体   English

在while循环中变量未在python中分配值

[英]Variable not assigning value in python in while loop

The following code doesn't seems to work. 以下代码似乎无效。 I want the ini variable to increment, and logically the code seems to do so. 我想让ini变量递增,逻辑上代码似乎也这样做。 But, this doesn't work. 但是,这不起作用。

def refinexml(xml):
links = []
ini = 0
while xml[ini:].find('<loc>') != -1:
    links.append(xml[xml[ini:].find('<loc>') + 5:xml[ini:].find('</loc>')])
    ini = xml[ini:].find('</loc>')
    print ini
return links

When you slice xml with xml[ini:] , you're getting just the end of it, meaning that find() is returning the position of the substring in that slice of xml , not all of it. 当你切xmlxml[ini:] ,你得到的只是它的结束,这意味着find()将返回子串的位置在该片 xml ,而不是它的全部。 For example, let xml be this: 例如,让xml是这样的:

<loc> blarg </loc> abcd <loc> text </loc>

Now, find('<loc>') returns 0 . 现在, find('<loc>')返回0 ini is 0 , so you capture " blarg " and find('</loc>') returns 12 . ini0 ,因此您捕获了" blarg "并且find('</loc>')返回12 ini is set to 12 . ini设置为12 On the next iteration of the loop, find('<loc>') finds the second "<loc>" . 在循环的下一次迭代中, find('<loc>')找到第二个"<loc>" You now capture " text " . 您现在捕获" text " This is where it goes wrong. 这就是问题所在。 You slice xml at ini to get "</loc> abcd <loc> text </loc>" . 您可以在inixml进行切片,以获得"</loc> abcd <loc> text </loc>" You call find('<loc>') on that slice, which finds the second "<loc>" in xml , which is the first occurrence of that substring in the slice. 您在该片上调用find('<loc>') ,它在xml找到第二个"<loc>" ,这是该子串在片中的首次出现。 The problem is that the index of that occurrence in the slice is 12 , not 24 , which is what you want. 问题在于切片中该事件的索引是12 ,而不是24 ,这就是您想要的。 You're missing the first ini characters in the string. 您缺少字符串中的前ini字符。

Fortunately, you know how many characters short you are. 幸运的是,您知道您有多少个字符。 You need to add ini , which you can do like this: 您需要添加ini ,您可以这样做:

ini = ini + xml[ini:].find('</loc>')

That, of course, can be shortened to this: 当然,可以缩短为:

ini += xml[ini:].find('</loc>')

You can fix your problem by adding a single character. 您可以通过添加单个字符来解决问题。

As mentioned in the comments, though, you should really use an XML parser. 但是,如注释中所述,您应该真正使用XML解析器。

@KSFT explained this very well. @KSFT很好地解释了这一点。 I'll just point out you can eliminate a lot of redundant invocations of find() in your code using something like this: 我只是指出,您可以使用以下代码消除代码中的许多find()冗余调用:

def refinexml(xml):
    links = []

    start = xml.find('<loc>')
    while start != -1:
        start += 5
        end = xml.find('</loc>', start)
        links.append(xml[start:end].strip())
        start = xml.find('<loc>', end + 6)
    return links

But, really, you should just use an XML parser, as even this code makes some potentially dangerous assumptions. 但是,实际上,您应该只使用XML解析器,因为即使此代码也做出了一些潜在的危险假设。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM