简体   繁体   English

使用BeautifulSoup解析特定内容

[英]Parsing specific content using BeautifulSoup

I am looking to extract fantasy football information from a website. 我希望从网站中提取幻想足球信息。 I can write enough code to get the following output but all I really want is the following information: 我可以编写足够的代码来获得以下输出,但是我真正想要的是以下信息:

"fullName":"Justin Forsett"
"pointsSEASON":75

Can anyone help explain how to isolate these items and write them to, for example, a csv file? 谁能帮助解释如何隔离这些项目并将其写入例如csv文件?

[<div class="mod-content" id="fantasy-content">{"averagePoints":9.4,"percentOwned":98.6,"pointsSEASON":75,"seasonOutlook":{"outlook":"Forsett finished 2014 as fantasy's No. 8 RB, so why aren't we higher on him? Well, it's difficult to reconcile what we know about his size (5-8, 197), age (30 in October) and career with the 1,529 scrimmage yards he racked up as Baltimore's surprise starter. Forsett had never even eclipsed 1,000 total yards in any of his six previous seasons. Yet his quickness and vision were consistently excellent last year, and new OC Marc Trestman loves throwing to RBs. Lorenzo Taliaferro and rookie Javorius Allen loom as heftier options, and some kind of rotation could develop. But Forsett will get the benefit of the doubt in Week 1.","seasonId":2015,"date":"Wed May 20"},"positionRank":18,"playerId":11467,"percentChange":-0.2,"averageDraftPosition":42.5,"fullName":"Justin Forsett","mostRecentNews":{"news":null,"spin":"The Jaguars have allowed the second-fewest yards per carry (3.4) in the league, but have ceded one rushing score per game in the process. Forsett will need a good deal of volume to overcome a quietly tough matchup, but we're trusting the workload will be enough.","date":"Tue Nov 10"},"totalPoints":75,"projectedPoints":13.957546548,"projectedDifference":4.582546548}</div>]

It looks like the text of the tag you are looking for is in JSON format. 看起来您要查找的标签文本为JSON格式。 You have successfully gotten the div tag, but now you have to extract the JSON, and then extract the information you want. 您已经成功获取了div标签,但是现在您必须提取JSON,然后提取所需的信息。 Here is what you will need to add to your code. 这是您需要添加到代码中的内容。

import json

rawJSONString = {originaltag}.get_text()
JSONString = json.loads(rawJSONString)
print(JSONString['fullName'])
print(JSONString['pointsSEASON'])

{originaltag} is the tag you printed up above, since you didn't show your code, I couldn't run it. {originaltag}是您上面打印的标签,因为您没有显示代码,所以我无法运行它。 Instead I ran the following code 相反,我运行了以下代码

string = '{"averagePoints":9.4,"percentOwned":98.6,"pointsSEASON":75,"seasonOutlook":{"outlook":"Forsett finished 2014 as fantasys No. 8 RB, so why arent we higher on him? Well, its difficult to reconcile what we know about his size (5-8, 197), age (30 in October) and career with the 1,529 scrimmage yards he racked up as Baltimores surprise starter. Forsett had never even eclipsed 1,000 total yards in any of his six previous seasons. Yet his quickness and vision were consistently excellent last year, and new OC Marc Trestman loves throwing to RBs. Lorenzo Taliaferro and rookie Javorius Allen loom as heftier options, and some kind of rotation could develop. But Forsett will get the benefit of the doubt in Week 1.","seasonId":2015,"date":"Wed May 20"},"positionRank":18,"playerId":11467,"percentChange":-0.2,"averageDraftPosition":42.5,"fullName":"Justin Forsett","mostRecentNews":{"news":null,"spin":"The Jaguars have allowed the second-fewest yards per carry (3.4) in the league, but have ceded one rushing score per game in the process. Forsett will need a good deal of volume to overcome a quietly tough matchup, but were trusting the workload will be enough.","date":"Tue Nov 10"},"totalPoints":75,"projectedPoints":13.957546548,"projectedDifference":4.582546548}'
s = json.loads(string)
print(s['fullName'])
print(s['pointsSEASON'])

And got this output 并得到了这个输出

Justin Forsett
75

Edited to add: Here is information on writing to a csv file. 编辑添加: 是有关写入csv文件的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM