使用漂亮的汤和python抓取和解析数据表

Question

Hi everyone so I am trying to scrape table from CIA website that shows data on roads of different countries based on unpaved and paved roads. 大家好，我想从中央情报局网站抓取一张表格，该表格根据未铺设和已铺设的道路显示不同国家的道路数据。 I wrote this script to extract. 我写了这个脚本来提取。 Secondly I am trying to parse out information from the second column into separate fields but I don't know how to do that. 其次，我试图将第二列中的信息解析为单独的字段，但是我不知道该怎么做。 After that I want to save into a CSV file with the headers for each column and data. 之后，我想将每个列和数据的标题保存到CSV文件中。

Here is my code: 这是我的代码：

import csv
import requests 
from bs4 import BeautifulSoup

course_list = []
url = "https://www.cia.gov/library/publications/the-world-factbook/fields/print_2085.html"
r = requests.get(url)
soup=BeautifulSoup(r.content)


for tr in soup.find_all('tr')[1:]:
          tds=tr.find_all('td')
          print (tds[1].text)

Second Column has three parts of information that I want to parse out how do I do that? 第二专栏包含三部分信息，我想解析该怎么做？

Thanks! 谢谢！

Answer 1

Depending on how you want to achieve the extraction you could do the following: 根据要实现提取的方式，可以执行以下操作：

roadways = tds[1].text.strip().split('\n')

This removes some space from the beginning and end from the content of the second column and splits it by the newline character. 这会从第二列的内容的开头和结尾删除一些空间，并用换行符分隔。 The result would be a list like this: 结果将是这样的列表：

['total: 97,267 km', 'paved: 18,481 km', 'unpaved: 78,786 km (2002)']

From here you could remove the labels like total or paved from the contents: 在这里，您可以从内容中删除total标签或paved的标签：

roadways = [x[x.index(':')+1:].strip() for x in tds[1].text.strip().split('\n')]

Which would result in the following list: 这将导致以下列表：

['97,267 km', '18,481 km', '78,786 km (2002)']

And this you can store in your CSV file: 这可以存储在CSV文件中：

export_file = open(..., 'w')
wr = csv.writer(export_file, quoting=csv.QUOTE_ALL)
wr.writerow(['total','paved','unpaved'])

This goes for each row you extract: 这适用于您提取的每一行：

wr.writerow(roadways)

使用漂亮的汤和python抓取和解析数据表

问题描述

1 个解决方案

解决方案1
0 2015-08-19 10:46:18

使用漂亮的汤和python抓取和解析数据表

问题描述

1 个解决方案

解决方案1 0 2015-08-19 10:46:18

解决方案1
0 2015-08-19 10:46:18