简体   繁体   中英

how to scrape websites that using django

I wanted to create a robot to scrape a website with this address :

https://1xxpers100.mobi/en/line/

But the problem is that when I wanted to get data from this website I realized that this website is using django because they are using phrases like {{if group_name}} and others

there is a loop created with this kind of method and it creates table rows and the information that I want is there.

when I am working with python and I download the html code I can't find any content but "{{code}}" in there, but when I'm working with chrome developer tools (inspect) and when I work with console I can see the content that is inside of the table that I want

How can I get html codes that holds the content of that table like chrome tools to get the information that I want from this website?

My way to get the codes is using python :

import urllib.request

fp = urllib.request.urlopen("https://1xxpers100.mobi/en/line/")
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

This should work for what you want:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://1xxpers100.mobi/en/line/')
soup = BeautifulSoup(r.content, 'lxml')

print(soup.encode("utf-8"))

here 'lmxl' is what I use because it worked for the site I tested it on. If you have trouble with that just try another parser.

another problem is that there is a character that isn't recognized by default. so read the contents of soup using utf-8

Extra Info

This has nothing to do with django. HTML has what is described as a "tree" like structure. Where each set of tags is the parent of all children tags immediately inside it. You just weren't reading deep enough into the tree.

HTML

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM