简体   繁体   中英

How to get the search result from BeautifulSoup?

I am not super used to Beautifulsoup yet (even though it is super useful). My question that I have is if I have a website like this

https://bigd.big.ac.cn/dogsdv2/pages/modules/indsnp/indsnp_search.jsp

and I were to get results from passing in P2RY12 into "gene name" input box, what do I need to do?

Also, in general, if I want to get a search result from a certain website what do I need to do?

If you open Firefox/Chrome webmaster tools, you can observe where the page is making requests. So when typing P2RY12 into search box and clicking the submit button, the page is making POST request to http://bigd.big.ac.cn/dogsdv2/indsnp/searchIndSNPSingle.action .

In general, you need to know the URL and parameters sent to the URL to get any information back.

This example grabs some information from the first page of results:

import requests
from bs4 import BeautifulSoup

url = 'http://bigd.big.ac.cn/dogsdv2/indsnp/searchIndSNPSingle.action'

data = {
    'totalCount': -1,
    'searchForm.chrom': 0,
    'searchForm.start': '',
    'searchForm.end': '',
    'searchForm.rsid': '',
    'searchForm.popu':  0,
    'searchForm.geneid': '',
    'searchForm.genename': 'P2RY12',
    'searchForm.goterm': '',
    'searchForm.gokeyword': '',
    'searchForm.limitFlag': 1,
    'searchForm.numlimit':  1000
}

headers = {
    'Referer': 'https://bigd.big.ac.cn/dogsdv2/pages/modules/indsnp/indsnp_search.jsp',
}

soup = BeautifulSoup(requests.post(url, data=data, headers=headers).text, 'html.parser')

for td in soup.select('table.table7 tr > td:nth-child(3)'):
    a = td.select_one('a')
    print('SNP ID:', a.get_text(strip=True))
    t1 = a.find_next_sibling('br').find_next_sibling(text=True)
    print('Position:', t1.strip())
    print('Location:', ', '.join( l.get_text(strip=True) for l in t1.find_next_siblings('a') ))
    print('Genotype:', a.find_next_siblings('br')[2].find_next_sibling(text=True).strip())
    print('-' * 80)

Prints:

SNP ID: cfa19627795
Position: Chr23:45904511
Location: ENSCAFG00000008485, ENSCAFG00000008531, ENSCAFG00000008534
Genotype: G
--------------------------------------------------------------------------------
SNP ID: cfa19627797
Position: Chr23:45904579
Location: ENSCAFG00000008485, ENSCAFG00000008531, ENSCAFG00000008534
Genotype: C
--------------------------------------------------------------------------------
SNP ID: cfa19627803
Position: Chr23:45904842
Location: ENSCAFG00000008485, ENSCAFG00000008531, ENSCAFG00000008534
Genotype: C
--------------------------------------------------------------------------------

...and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM