简体   繁体   中英

Variable not assigning value in python in while loop

The following code doesn't seems to work. I want the ini variable to increment, and logically the code seems to do so. But, this doesn't work.

def refinexml(xml):
links = []
ini = 0
while xml[ini:].find('<loc>') != -1:
    links.append(xml[xml[ini:].find('<loc>') + 5:xml[ini:].find('</loc>')])
    ini = xml[ini:].find('</loc>')
    print ini
return links

When you slice xml with xml[ini:] , you're getting just the end of it, meaning that find() is returning the position of the substring in that slice of xml , not all of it. For example, let xml be this:

<loc> blarg </loc> abcd <loc> text </loc>

Now, find('<loc>') returns 0 . ini is 0 , so you capture " blarg " and find('</loc>') returns 12 . ini is set to 12 . On the next iteration of the loop, find('<loc>') finds the second "<loc>" . You now capture " text " . This is where it goes wrong. You slice xml at ini to get "</loc> abcd <loc> text </loc>" . You call find('<loc>') on that slice, which finds the second "<loc>" in xml , which is the first occurrence of that substring in the slice. The problem is that the index of that occurrence in the slice is 12 , not 24 , which is what you want. You're missing the first ini characters in the string.

Fortunately, you know how many characters short you are. You need to add ini , which you can do like this:

ini = ini + xml[ini:].find('</loc>')

That, of course, can be shortened to this:

ini += xml[ini:].find('</loc>')

You can fix your problem by adding a single character.

As mentioned in the comments, though, you should really use an XML parser.

@KSFT explained this very well. I'll just point out you can eliminate a lot of redundant invocations of find() in your code using something like this:

def refinexml(xml):
    links = []

    start = xml.find('<loc>')
    while start != -1:
        start += 5
        end = xml.find('</loc>', start)
        links.append(xml[start:end].strip())
        start = xml.find('<loc>', end + 6)
    return links

But, really, you should just use an XML parser, as even this code makes some potentially dangerous assumptions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM