[英]Making a pythonic decision on developing a web scraper module
This is a fairly high level question. 这是一个相当高水平的问题。 I have developed many different web scrapers that work in different websites.
我已经开发了许多在不同网站上均可使用的网页抓取工具。
I have many different versions of the functions named getName() and getAddress(). 我有许多不同版本的函数,分别名为getName()和getAddress()。
Is it pythonic/not terrible coding practice to do this in the function of a module? 在模块功能中执行此操作是否是pythonic /可怕的编码实践? If this is bad to do, could someone give me a high level tip on how to manage this kind of library-of-scrapers?
如果这样做不好,有人可以给我有关如何管理这种刮板库的高级提示吗?
def universalNameAdressGrab(url):
page = pullPage(url)
if 'Tucson.com' in url:
import tucsonScraper
name = getName(page) #this is the getName for Tucson
address = getAddress(page)
elif 'NewYork.com' in url:
import newyorkScraper
name = getName(page) #this is the getName for NewYork
address = getAddress(page)
return {'name':name, 'address':address}
It is probably more pythonic to import everything at the top of the file. 将所有内容导入文件顶部可能更像pythonic。 After that you can reference the functions by module and remove a lot of duplicated code.
之后,您可以按模块引用功能并删除大量重复的代码。 You may run into issues with URL capitalization, so I would standardize that as well.
您可能会遇到URL大写的问题,因此我也将其标准化。 You could use urlparse for that.
您可以为此使用urlparse 。 I would consider something like the following more pythonic:
我会考虑以下类似的pythonic:
import tucsonScraper
import newyorkScraper
def universalNameAdressGrab(url):
page = pullPage(url)
scraper = None
if 'Tucson.com' in url:
scraper = tucsonScraper
elif 'NewYork.com' in url:
scraper = newyorkScraper
else:
raise Exception("No scraper found for url")
return {'name': scraper.getName(page), 'address': scraper.getAddress(page)}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.