从网上提取表格

Question

I need to extract all tables from this web:(only the second column) https://zh.wikipedia.org/wiki/上海证券交易所上市公司列表 我需要从此网站提取所有表格：（仅第二列） https://zh.wikipedia.org/wiki/上海证券交易所上市公司列表

Well, the last three tables I don't need it... 好吧，我不需要最后三张桌子...

However, my code only extract the second column from the first table. 但是，我的代码仅从第一个表中提取第二列。

 import pickle
 import requests
 def save_china_tickers():
     resp = requests.get('https://zh.wikipedia.org/wiki/上海证券交易所上市公司列表')
     soup = bs.BeautifulSoup(resp.text, 'lxml')
     table = soup.find('table', {'class':'wikitable'})
     tickers=[]
     for row in table.findAll('tr')[1:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
             with open('chinatickers.pickle','wb') as f:
         pickle.dump(tickers,f)
         return tickers save_china_tickers()

Answer 1

I have an easy method. 我有一个简单的方法。

Get HTTP Response 获取HTTP响应
Find all tables using RegEx 使用RegEx查找所有表
Parse HTML Table to list of lists 解析HTML表到列表列表
Iterate the over each list in list 遍历列表中的每个列表

Requirements 要求

dashtable dashtable

Code 码

 from urllib.request import urlopen from dashtable import html2data # to convert html table to list of list import re url = "https://zh.wikipedia.org/wiki/%E4%B8%8A%E6%B5%B7%E8%AF%81%E5%88%B8%E4%BA%A4%E6%98%93%E6%89%80%E4%B8%8A%E5%B8%82%E5%85%AC%E5%8F%B8%E5%88%97%E8%A1%A8" # Reading http content data = urlopen(url).read().decode() # now fetching all tables with the help of regex tables = ["<table>{}</table>".format(table) for table in re.findall(r"<table .*?>(.*?)</table>", data, re.M|re.S|re.I)] # parsing data parsed_tables = [html2data(table)[0] for table in tables] # html2data returns a tuple with 0th index as list of lists # lets take first table ie 600000-600099 parsed = parsed_tables[0] # column names of first table print(parsed[0]) # rows of first table 2nd column for index in range(1, len(parsed)): print(parsed[index][1]) """ Output: All the rows of table 1, column 2 excluding the headers """

从网上提取表格

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-22 00:00:33

从网上提取表格

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-22 00:00:33

解决方案1
0 已采纳 2018-09-22 00:00:33