简体   繁体   English

web 数据抓取问题我不知道如何从文件中导出信息。html 到我的 python 程序

[英]web scraping problem with data i don't know how to export information from file.html to my python programme

recently i start studying Web scraping, and today i made myself a challenge i tried to write information about every world from tibia.com, about what's name has world, how many people playing on it, what type of server is it, etc.最近我开始研究 Web 抓取,今天我给自己一个挑战,我试图从 tibia.com 写出关于每个世界的信息,关于世界的名称,有多少人在上面玩,它是什么类型的服务器等。

i created something like this我创造了这样的东西

from urllib.request import urlopen, Request

from bs4 import BeautifulSoup as soup 

from fake_useragent import UserAgent


my_url = 'https://www.tibia.com/community/?subtopic=worlds'

uClient = urlopen(Request(my_url, headers={'User-Agent': 'Mozilla'}))


page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("tr", {"class":['Even', 'Odd']})

for container in containers:

    informations = containers.findAll("td")
    world = informations[0].txt

but i don't know how can I pull out information from td, my data file looks like:但我不知道如何从 td 中提取信息,我的数据文件如下所示:

<tr class="Odd">
<td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Cosera">Cosera</a>
</td>
<td style="text-align: right;">75</td>
<td>North America</td>
<td>Optional PvP</td>

it's one from 92 worlds, and what i'm looking for is how can i extract information about world from this line它来自 92 个世界,我正在寻找的是如何从这条线上提取有关世界的信息

<td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Cosera">Cosera</a>

and if you give me note how to do this, everything else i think i will figure out.如果你给我注意如何做到这一点,我想我会弄清楚其他一切。

If someone has idea I would be greatful for your clue.如果有人有想法,我会很感激你的线索。

I'm not exactly sure what you mean but I'll try to give a solution to your problem.我不完全确定您的意思,但我会尝试为您的问题提供解决方案。
It looks like you're trying to get all the row information from the table on the page.看起来您正试图从页面上的表格中获取所有行信息。 The simplest way to do this is to first get all the <tr> elements (all the rows) which you had already successfully done.最简单的方法是首先获取您已经成功完成的所有 <tr> 元素(所有行)。
Then we want to loop through these rows to extract the data from them.然后我们要遍历这些行以从中提取数据。
I'm not sure if you only want the 'Cosera' world, or just the whole table.我不确定您是只想要“Cosera”世界,还是只想要整张桌子。 If you want the whole table you can just remove the if statement in the code below.如果你想要整个表,你可以删除下面代码中的if语句。

from urllib.request import urlopen, Request

from bs4 import BeautifulSoup as soup 

my_url = 'https://www.tibia.com/community/?subtopic=worlds'

world_to_find = 'Cosera'

uClient = urlopen(Request(my_url, headers={'User-Agent': 'Mozilla'}))


page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

all_rows = page_soup.find_all('tr', {"class":["Odd", "Even"]})


for row in all_rows:
    if (row.select_one("td").text == world_to_find):
        data = {}
        
        row = row.findChildren("td" , recursive=False)
        data['world'] = row[0].text
        data['online'] = row[1].text
        data['location'] = row[2].text
        data['pvp_type'] = row[3].text
        data['additional_info'] = row[5].text
        
        print(data)

Outputs:输出:

{'world': 'Cosera', 'online': '86', 'location': 'North America', 'pvp_type': 'Optional PvP', 'additional_info': 'blocked'}

If this wasn't what you meant please explain in your post what exactly you want the output to be如果这不是你的意思,请在你的帖子中解释你到底想要 output 是什么

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我的 web 抓取代码遇到问题 我真的不知道问题出在哪里 - I'm facing a problem with my web scraping code I don't really know the problem is 我的网络抓取不起作用,我不知道问题出在哪里 - my web scraping does not work and i don t know what the problem is 使用Python和Selenium进行Web抓取,不知道如何获取动态数据 - Web scraping using Python and Selenium, don't know how to get dynamic data 如何在Python中使用此信息? 我不知道如何使用此数据类型 - How do I use this information in Python? I don't know how to use this data-type 我正在尝试使用 Python 进行网络抓取,并提出了如下请求并得到了响应。 但不知道如何处理 - I am trying to do web scraping with Python and have made a request like below and got the response. but don't know how to process it 如何通过python将Web抓取数据导出到csv - how to export the Web scraping data into csv by python 网页抓取问题,如何在一个 html 文件中显示来自 2 个不同站点的数据 - Web-scraping problem, how to display data from 2 different sites in one html file 从网页抓取信息后,如何创建 Python CSV 文件? - How can I create a Python CSV file after scraping information from a web page? 我的代码没有运行,我正在尝试从 json 文件中检索数据。 我不知道我的 sqlite3 查询的问题 - My code is not running, I am trying to retrieve data from json file. i don't know the problem with my sqlite3 query 我不知道如何在我的 web 应用程序中使用 Z319C34606A7D4A9767 - I don't know how to select specific items from jsonnified object in my web application using flask
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM