Printing only specific part of a link using beautiful soup

Question

I am using BeautifulSoup to parse a website using the following code. I am able to parse the website and print data, however, I only want to print only a part of the data in the link. Can anyone provide inputs on how can this be done?

from bs4 import BeautifulSoup as bs
import argparse
import urllib
import urllib2
import getpass
import re
import requests

def update (url):
    print url
    req = urllib2.Request(url=url)
    try:
        f = urllib2.urlopen(req)
        txt = f.read()
        soup = bs(txt)
        print soup
        f.close()


def main ():
    #For logging
    print "test"
    parser = argparse.ArgumentParser(description='This is the update.py script created by test')
    parser.add_argument('-u','--url',action='store',dest='url',default=None,help='<Required> url link',required=True)
    results = parser.parse_args()# collect cmd line args
    url = results.url
    #print url
    update(url)
if __name__ == '__main__':
    main()

Following is the current output. Expected result is shown below.

current output :-

==== Test results ====

results are in \\data\loc

==== <font color="#008000">Build Combo</font> ====

{| border="1" cellspacing="1" cellpadding="1"
|-
! bgcolor="#67B0F9" scope="col" | test1
! bgcolor="#67B0F9" scope="col" | test2
! bgcolor="#67B0F9" scope="col" | test3
! bgcolor="#67B0F9" scope="col" | test4
|-
| [http:link.com]
|}

==== <font color="#008000">COde:</font> ====

Expected output:-

==== <font color="#008000">Build Combo</font> ====

{| border="1" cellspacing="1" cellpadding="1"
|-
! bgcolor="#67B0F9" scope="col" | test1
! bgcolor="#67B0F9" scope="col" | test2
! bgcolor="#67B0F9" scope="col" | test3
! bgcolor="#67B0F9" scope="col" | test4
|-
| [http:link.com]
|}

Answer 1

I'll be the first to admit I'm not quite sure what you're asking, but I think it's that you want to print only the a element rather than the whole soup if I'm not mistaken. If that's the case, you need to find the element you want to print and then, well, print it. Your update method might look more like this, assuming you're hunting for an a element.

def update (url):
    print url
    req = urllib2.Request(url=url)
    try:
        f = urllib2.urlopen(req)
        txt = f.read()

        soup = bs(txt)
        a_element = soup.find("a")
        print a_element

    f.close()

Printing only specific part of a link using beautiful soup

Question

1 answers

solution1
0 2013-03-09 08:24:24

Printing only specific part of a link using beautiful soup

Question

1 answers

solution1 0 2013-03-09 08:24:24

solution1
0 2013-03-09 08:24:24