[英]Asyncio Loop Within Asyncio Loop
I'm just starting to use Asyncio and I'm trying to use it to parse a website. 我刚刚开始使用Asyncio,并且尝试使用它来解析网站。
I'm trying to parse 6 sections ( self.signals
) of the site, each section has N number of pages with tables on them, so essentially I'm trying to async the loop that calls what section, and async the pages in each section. 我正在尝试解析站点的6个部分( self.signals
),每个部分都有N个页面,每个页面上都有表格,因此本质上,我正在尝试异步调用哪个部分的循环,并在每个部分中同步页面部分。 This is what I have so far. 到目前为止,这就是我所拥有的。
class FinViz():
def __init__(self):
self.url = 'https://finviz.com/screener.ashx?v=160&s='
self.signals = {
'Earnings_Before' : 'n_earningsbefore',
'Earnings_After' : 'n_earningsafter',
'Most_Active' : 'ta_mostactive',
'Top_Gainers' : 'ta_topgainers',
'Most_Volatile' : 'ta_mostvolatile',
'News' : 'n_majornews',
'Upgrade' : 'n_upgrades',
'Unusual_Volume' : 'ta_unusualvolume'
}
self.ticks = []
def _parseTable(self, data):
i, signal = data
url = self.signals[signal] if i == 0 else self.signals[signal] + '&r={}'.format(str(i * 20 + 1))
soup = BeautifulSoup(urlopen(self.url + url, timeout = 3).read(), 'html5lib')
table = soup.find('div', {'id' : 'screener-content'}).find('table',
{'width' : '100%', 'cellspacing': '1', 'cellpadding' : '3', 'border' : '0', 'bgcolor' : '#d3d3d3'})
for row in table.findAll('tr'):
col = row.findAll('td')[1]
if col.find('a'):
self.ticks.append(col.find('a').text)
async def parseSignal(self, signal):
try:
soup = BeautifulSoup(urlopen(self.url + self.signals[signal], timeout = 3).read(), 'html5lib')
tot = int(soup.find('td', {'class' : 'count-text'}).text.split()[1])
with concurrent.futures.ThreadPoolExecutor(max_workers = 20) as executor:
loop = asyncio.get_event_loop()
futures = []
for i in range(tot // 20 + (tot % 20 > 0)):
futures.append(loop.run_in_executor(executor, self._parseTable, (i, signal)))
for response in await asyncio.gather(*futures):
pass
except URLError:
pass
async def getAll(self):
with concurrent.futures.ThreadPoolExecutor(max_workers = 20) as executor:
loop = asyncio.get_event_loop()
futures = []
for signal in self.signals:
futures.append(await loop.run_in_executor(executor, self.parseSignal, signal))
for response in await asyncio.gather(*futures):
pass
print(self.ticks)
if __name__ == '__main__':
x = FinViz()
loop = asyncio.get_event_loop()
loop.run_until_complete(x.getAll())
This does do the job successfully, but it somehow does it slower than if I were to do the parsing without asyncio
. 这确实可以成功完成这项工作,但是它比我不使用asyncio
进行解析时要慢一些。
Any tips for an asynchronous noob? 异步菜鸟有什么提示吗?
Edit: Added full code 编辑:添加了完整代码
Remember python has a GIL, so threaded code will not help performance. 请记住,python有一个GIL,所以线程化代码对性能没有帮助。 To potentially speed things up use a ProcessPoolExecutor however note you'll incur the following overhead: 要潜在地加快速度,请使用ProcessPoolExecutor,但是请注意,您会产生以下开销:
You can avoid 1. if you run on a fork safe environment and store the data in a global variable. 您可以避免1.如果您在fork安全的环境中运行并将数据存储在全局变量中。
You can also do stuff like share a memory mapped file...also sharing raw strings/bytes is the fastest. 您还可以执行诸如共享内存映射文件之类的操作...也共享原始字符串/字节是最快的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.