[英]Fetch data using Beautifulsoup and store it into a dictionary Python
I am trying to fetch a html file using Beautifulsoap. 我正在尝试使用Beautifulsoap获取html文件。 Later I want to show data by creating a REST API in a JSON format. 稍后,我想通过以JSON格式创建REST API来显示数据。 REST API is working fine. REST API运行正常。 But, I am facing problem to structure the data in expected format. 但是,我面临着以预期格式构造数据的问题。 So, I am attaching the Python code that just handles the fetched data. 因此,我将附加仅处理获取的数据的Python代码。
HTML :- HTML:-
<!DOCTYPE html>
<html>
<head>
<style>
table, th, td {
border: 1px solid black;
}
</style>
</head>
<body>
<table>
<thead>
<tr>
<th>Date</th>
<th>Savings</th>
<th>Expenses</th>
</tr>
</thead>
<tbody>
<tr>
<td>January</td>
<td>$100</td>
<td>$200</td>
</tr>
<tr>
<td>February</td>
<td>$80</td>
<td>$300</td>
</tr>
</tbody>
</table>
</body>
</html>
My expected output should be :-
{
"data":
"Savings" : {
"Janunary" : $100,
"February" : $80
},
"Expenses" : {
"January" : $200,
"February" : $300
}
}
Python Code that I have written, 我编写的Python代码,
bs_content = BeautifulSoup(ra.body, 'html.parser') #this parse the whole html
headers = []
result = defaultdict(dict)
table = bs_content.find_all('table')
if not headers:
for th in table.find('thead').findAll('th'):
text = th.text
headers.append(text)
for tr in table.find('tbody').findAll('tr'):
tds = tr.findAll('td')
for header, td in zip(headers, tds):
value = td.text.strip()
result[header] = value
return result
So, result
should be updated like, 因此, result
应更新为
result['savings']['January'] = $100,
result['savings']['February'] = $80,
result['Expenses']['January'] = $200,
result['Expenses']['February'] = $300
This solution should work for a table that has more than just 2 months. 此解决方案应适用于仅拥有2个月以上的表。
soup = BeautifulSoup(ra.body, 'lxml')
table = soup.select_one('table')
headers = [header.text for header in table.select('th')][1:]
result = {headers[0]: {}, headers[1]: {}}
for row in table.select('tbody tr'):
data = [value.text for value in row.select('td')]
result[headers[0]][data[0]] = data[1]
result[headers[1]][data[0]] = data[2]
return result
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.