简体   繁体   English

无法获得格式正确的字典输出

[英]Unable to get the output as properly formatted dictionary

I've written a scarper in python to parse some data from a webpage. 我用python编写了一个吓人程序,以解析网页中的某些数据。 My intention is to store the data in dictionary. 我的意图是将数据存储在字典中。 Instead of demonstrating the full table I just tried with a single tr containing the information of a single player. 我没有演示完整的表格,而是尝试了一个包含单个玩家信息的tr The data are coming through but the format of the output is not how dictionary looks like. 数据通过,但是输出的格式不是字典的样子。 Any help to make it accurate will be highly appreciated. 任何帮助使其准确的都将受到高度赞赏。

This is my try: 这是我的尝试:

import requests
from bs4 import BeautifulSoup

URL = "https://fantasy.premierleague.com/player-list/"

def get_data(link):
    res = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    data = []
    for content in soup.select("div.ism-container"):
        itmval = {}
        itmval['name'] = content.select_one("h2").text
        itmval['player_info'] = [[item.get_text(strip=True) for item in items.select("td")] for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)")]
        data.append(itmval)

    print(data)

if __name__ == '__main__':
    get_data(URL)

The output I'm having: 我的输出:

[{'name': 'Goalkeepers', 'player_info': [['De Gea', 'Man Utd', '161', '£5.9']]}]

The output I expect to have: 我期望的输出是:

[{'name': 'Goalkeepers', 'player_info': ['De Gea', 'Man Utd', '161', '£5.9']}]

Btw, I intend to parse the full table but I showed here a minimum portion for your well obseration. 顺便说一句,我打算解析整个表,但是我在这里展示了您对井井有条的最小部分。

If you want to use nested list comprehension, try to replace 如果要使用嵌套列表推导,请尝试替换

[[item.get_text(strip=True) for item in items.select("td")] for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)")]

with

[item.get_text(strip=True) for items in content.select(" table:nth-of-type(1) tr:nth-of-type(2)") for item in items.select("td")]

player_info is equal to the following expression (simplified a bit): player_info等于以下表达式(简化了一点):

player_info = [[item for item in items] for items in content]

content seems to only have one item. content似乎只有一项。 What you want is probably something like: 您想要的可能是这样的:

 player_info = [item for item in content]

If content have more than one item, remove the second pair of [ ... ] in the first code block. 如果内容中有多个项目,请删除第一个代码块中的第二对[ ... ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM