简体   繁体   English

使用漂亮的汤只打印链接的特定部分

[英]Printing only specific part of a link using beautiful soup

I am using BeautifulSoup to parse a website using the following code. 我正在使用BeautifulSoup使用以下代码来解析网站。 I am able to parse the website and print data, however, I only want to print only a part of the data in the link. 我能够解析网站并打印数据,但是,我只想仅打印链接中的一部分数据。 Can anyone provide inputs on how can this be done? 任何人都可以提供有关如何完成此操作的意见吗?

from bs4 import BeautifulSoup as bs
import argparse
import urllib
import urllib2
import getpass
import re
import requests

def update (url):
    print url
    req = urllib2.Request(url=url)
    try:
        f = urllib2.urlopen(req)
        txt = f.read()
        soup = bs(txt)
        print soup
        f.close()


def main ():
    #For logging
    print "test"
    parser = argparse.ArgumentParser(description='This is the update.py script created by test')
    parser.add_argument('-u','--url',action='store',dest='url',default=None,help='<Required> url link',required=True)
    results = parser.parse_args()# collect cmd line args
    url = results.url
    #print url
    update(url)
if __name__ == '__main__':
    main()

Following is the current output. 以下是当前输出。 Expected result is shown below. 预期结果如下所示。

current output :-

==== Test results ====

results are in \\data\loc

==== <font color="#008000">Build Combo</font> ====

{| border="1" cellspacing="1" cellpadding="1"
|-
! bgcolor="#67B0F9" scope="col" | test1
! bgcolor="#67B0F9" scope="col" | test2
! bgcolor="#67B0F9" scope="col" | test3
! bgcolor="#67B0F9" scope="col" | test4
|-
| [http:link.com]
|}

==== <font color="#008000">COde:</font> ====

Expected output:-

==== <font color="#008000">Build Combo</font> ====

{| border="1" cellspacing="1" cellpadding="1"
|-
! bgcolor="#67B0F9" scope="col" | test1
! bgcolor="#67B0F9" scope="col" | test2
! bgcolor="#67B0F9" scope="col" | test3
! bgcolor="#67B0F9" scope="col" | test4
|-
| [http:link.com]
|}

I'll be the first to admit I'm not quite sure what you're asking, but I think it's that you want to print only the a element rather than the whole soup if I'm not mistaken. 我将是第一个承认我不太确定您要问什么的人,但是我认为,如果我没记错的话,您只想打印a元素而不是整个soup If that's the case, you need to find the element you want to print and then, well, print it. 在这种情况下,您需要find要打印的元素,然后再打印它。 Your update method might look more like this, assuming you're hunting for an a element. 假设您正在寻找a元素,那么您的更新方法可能看起来更像这样。

def update (url):
    print url
    req = urllib2.Request(url=url)
    try:
        f = urllib2.urlopen(req)
        txt = f.read()

        soup = bs(txt)
        a_element = soup.find("a")
        print a_element

    f.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM