简体   繁体   English

我怎样才能在 BeautifulSoup 中只有来自“#0-9 和 AZ”的 select 个链接?

[英]How can I only select links from "#0-9 and A-Z" in BeautifulSoup?

my URL is this我的 URL 是这个

https://en.wikipedia.org/wiki/List_of_South_Korean_dramas https://en.wikipedia.org/wiki/List_of_South_Korean_dramas

This works well in selecting all links from for A to Z.这适用于选择从 A 到 Z 的所有链接。

 link = s.get(url)
    link_soup = BeautifulSoup(link.text, 'lxml')
    links = (
        link_soup
        .select_one('#A')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

But when I try to select_one #0-9但是当我尝试选择 select_one #0-9

.... ....

 link_soup
        .select_one('#0-9')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

I get this error我收到这个错误

SelectorSyntaxError: Malformed id selector at position 0
  line 1:
#0-9
^

How can I select only the links from "#0-9 and AZ"?我怎样才能 select 只有来自“#0-9 和 AZ”的链接? I know I can just use a for loop and use re to change the ending of the URL and manually scrape the links from there but is there a way to get the same results using select or bs4.我知道我可以只使用 for 循环并使用 re 更改 URL 的结尾并从那里手动抓取链接但是有没有办法使用 select 或 bs4 获得相同的结果。

Thanks again for the help.再次感谢您的帮助。

To answer the direct question you can use an attribute = value css selector to specify the id attribute and its value.要回答直接问题,您可以使用 attribute = value css 选择器来指定 id 属性及其值。 The numbers are within "" and so do not pose an issue to the parser.数字在 "" 之内,因此不会对解析器造成问题。

link_soup.select('[id="0-9"]')

Or escape the leading digit using its Unicode code point (no following space needed in this case and can be abbreviated to \30)或者使用其 Unicode 代码点转义前导数字(在这种情况下不需要后续空格,可以缩写为 \30)

link_soup.select('#\\30-9')

However, you could specify a single pattern to extract all links in one go and without the additional up down walking of the DOM.但是,您可以指定一个模式来提取一个 go 中的所有链接,而无需额外的 DOM 上下遍历。

links = ['https://en.wikipedia.org' + i['href'] for i in link_soup.select('h2:not(:has(#See_also)) + ul a')]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python中仅用“ AZ”,“ az”,“ 0-9”和“ _”编码UTF-8字符串 - How to encode UTF-8 strings with only “A-Z”,“a-z”,“0-9”, and “_” in Python 如何从“0-9 AZ”按顺序“自动生成”字符串 - How to “Auto Generate” String in Sequence from “0-9 A-Z” 如果我只按任何字母(AZ,az)或数字(0 - 9),如何将焦点从 QListWidget 更改为 QLineEdit? - How to Change the Focus from QListWidget to QLineEdit, If I press only any Alphabets (A-Z, a-z) or numbers(0 - 9)? 如何使用 re.sub() 只留下字母 az、AZ、数字 0-9 和空格而不是除数? - How to use re.sub() to leave only letters a-z, A-Z, numbers 0-9 and spaces but not divide numbers? 我如何编写regEx来识别字符串何时除了az 0-9以外的其他内容? - How do I write regEx to recognize when a string has anything but a-z 0-9? Python计数0-9然后是az - Python count 0-9 then a-z 如何在 Python 中创建带有英文字母(AZ、az、0-9)的图像? - How to create images with English alphabets (A-Z, a-z, 0-9) in Python? 具有 0-9 和 AZ 的 Python 序列号生成器 - Python sequential number generator with 0-9 and A-Z 已清理的消息,仅包含字母 az 和数字 0-9,只有一个空格 - cleaned message, which contains only letters a-z, and numbers 0-9 with only one space 我需要制作一个接受数字 0-9 和 Az 大小写的程序。 如果输入不正确,如何使代码抛出错误 - I need to make a program that accepts numbers 0-9, and A-z upper and lower case. How do you make the code throw an error if the input isn't right
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM