简体   繁体   中英

Python Append Function and List

I'm trying to crawl and scrape urls from a nested XML sitemap using Python and beautiful soup.

I believe I got the first part down. I've built a simple loop to access the main XML sitemap and pull a list of XML's that match a certain criteria. Then it stores that index of XML's in a list.

The next part is where it gets fuzzy.

I'm trying to loop through each item from the above list and pull out each URL and append the output to a new list that will be written to a text file.

Here's my code for this section:

在此处输入图像描述

When I loop through and build the list I'm getting a weird output: 在此处输入图像描述

My first thought is Python is appending '/n' after each line break. But when I try to loop through the URLs I get this: 在此处输入图像描述

Any help or guidance would be greatly appreciated!

Cheers

Somehow python did not interpret \n as a newline character in this case (maybe cause by the marshalling of the XML contents). That's why it is not a legit URL and you got that error from requests.

A workaround would be to do a string.split("\\n") to get back the URLs into a list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM