[英]download file using python beautifulsoup and selenium
我想下載從搜索結果下載第一個pdb文件(下面給出名稱下載鏈接)。 我正在使用python,selenium和beautifulsoup。 到目前為止,我已經開發了代碼。
import urllib2
from BeautifulSoup import BeautifulSoup
from selenium import webdriver
uni_id = "P22216"
# set parameters
download_dir = "/home/home/Desktop/"
url = "http://www.rcsb.org/pdb/search/smart.do?smartComparator=and&smartSearchSubtype_0=UpAccessionIdQuery&target=Current&accessionIdList_0=%s" % uni_id
print "url - ", url
# opening the url
text = urllib2.urlopen(url).read();
#print "text : ", text
soup = BeautifulSoup(text);
#print soup
print
table = soup.find( "table", {"class":"queryBlue"} )
#print "table : ", table
status = 0
rows = table.findAll('tr')
for tr in rows:
try:
cols = tr.findAll('td')
if cols:
link = cols[1].find('a').get('href')
print "link : ", link
if link:
if status==1:
main_url = "http://www.rcsb.org" + link
print "main_url-----", main_url
status = False
browser.click(main_url)
status+=1
except:
pass
我正在形成無。
如何在搜索列表中下載第一個文件? (在這種情況下是2YGV)
Download link is : /pdb/protein/P32447
我不確定您要下載的是什么,但這里是如何下載2YGV文件的示例:
import urllib
import urllib2
from bs4 import BeautifulSoup
uni_id = "P22216"
url = "http://www.rcsb.org/pdb/search/smart.do?smartComparator=and&smartSearchSubtype_0=UpAccessionIdQuery&target=Current&accessionIdList_0=%s" % uni_id
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)
link = soup.find( "span", {"class":"iconSet-main icon-download"}).parent.get("href")
urllib.urlretrieve("http://www.rcsb.org/" + str(link), str(link.split("=")[-1]) + ".pdb")
此腳本將從頁面上的鏈接下載該文件。 這個腳本不需要selenium
,但我使用urllib
來檢索文件。 您可以閱讀這篇文章 ,了解有關如何使用urllib下載文件的更多信息。
編輯:
或者使用此代碼查找下載鏈接(這完全取決於您要下載的URL的文件):
import urllib
import urllib2
from bs4 import BeautifulSoup
uni_id = "P22216"
url = "http://www.rcsb.org/pdb/search/smart.do?smartComparator=and&smartSearchSubtype_0=UpAccessionIdQuery&target=Current&accessionIdList_0=%s" % uni_id
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)
table = soup.find( "table", {"class":"queryBlue"} )
link = table.find("a", {"class":"tooltip"}).get("href")
urllib.urlretrieve("http://www.rcsb.org/" + str(link), str(link.split("=")[-1]) + ".pdb")
以下是您如何做評論中提到的內容的示例:
import mechanize
from bs4 import BeautifulSoup
SEARCH_URL = "http://www.rcsb.org/pdb/home/home.do"
l = ["YGL130W", "YDL159W", "YOR181W"]
browser = mechanize.Browser()
for item in l:
browser.open(SEARCH_URL)
browser.select_form(nr=0)
browser["q"] = item
html = browser.submit()
soup = BeautifulSoup(html)
table = soup.find("table", {"class":"queryBlue"})
if table:
link = table.find("a", {"class":"tooltip"}).get("href")
browser.retrieve("http://www.rcsb.org/" + str(link), str(link.split("=")[-1]) + ".pdb")[0]
print "Downloaded " + item + " as " + str(link.split("=")[-1]) + ".pdb"
else:
print item + " was not found"
輸出:
Downloaded YGL130W as 3KYH.pdb
Downloaded YDL159W as 3FWB.pdb
YOR181W was not found
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.