简体   繁体   English

Beautiful Soup findAll 不能全部找到

[英]Beautiful Soup findAll doesn't find them all

I'm trying to parse a website and get some info with thefind_all() method, but it doesn't find them all.我正在尝试解析一个网站并使用find_all()方法获取一些信息,但它没有找到所有信息。

This is the code:这是代码:

#!/usr/bin/python3

from bs4 import BeautifulSoup
from urllib.request import urlopen

page = urlopen ("http://mangafox.me/directory/")
# print (page.read ())
soup = BeautifulSoup (page.read ())

manga_img = soup.findAll ('a', {'class' : 'manga_img'}, limit=None)

for manga in manga_img:
    print (manga['href'])

It only prints half of them...它只打印了一半...

Different HTML parsers deal differently with broken HTML.不同的 HTML 解析器处理损坏的 HTML 的方式不同。 That page serves broken HTML, and the lxml parser is not dealing very well with it:该页面提供损坏的 HTML,并且lxml解析器没有很好地处理它:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('http://mangafox.me/directory/')
>>> soup = BeautifulSoup(r.content, 'lxml')
>>> len(soup.find_all('a', class_='manga_img'))
18

The standard library html.parser has less trouble with this specific page:标准库html.parser对这个特定页面的麻烦较少:

>>> soup = BeautifulSoup(r.content, 'html.parser')
>>> len(soup.find_all('a', class_='manga_img'))
44

Translating that to your specific code sample using urllib , you would specify the parser thus:使用urllib将其转换为您的特定代码示例,您将指定解析器:

soup = BeautifulSoup(page, 'html.parser')  # BeatifulSoup can do the reading

The quick way to grab all href elements is to use CSS Selector which will select all a tags with an href element that contains /manga at the beginning link.获取所有href元素的快速方法是使用 CSS Selector,它将选择所有带有在开头链接中包含/mangahref元素a标签。

Output will contain all links that starts with /manga/"title" (check this in dev tools using inspector):输出将包含所有以/manga/"title"开头的链接(使用检查器在开发工具中检查):

import requests
from bs4 import BeautifulSoup
import lxml

html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')

for a_tag in soup.select('a[href*="/manga"]'):
    link = a_tag['href']
    link = link[1:]
    print(f'http://fanfox.net/{link}')

Alternative method:替代方法:

Change requests.get to a different URL ( directory/2.html )requests.get更改为不同的 URL ( directory/2.html )

Here's the working code(works 2-3-4-5-6.. pages as well) and replit.com to play around:这是工作代码(工作 2-3-4-5-6 .. 页)和replit.com玩:

import requests
from bs4 import BeautifulSoup
import lxml

html = requests.get('http://fanfox.net/directory/').text
soup = BeautifulSoup(html, 'lxml')

for manga in soup.select('.line'):
    title = manga.select('.manga-list-1-item-title a')
    for t in title:
        print(t.text)
    for i in manga.findAll('img', class_='manga-list-1-cover'):
        img = i['src']
        print(img)
    for l in manga.findAll('p', class_='manga-list-1-item-title'):
        link = l.a['href']
        link = link[1:]
        print(f'http://fanfox.net/{link}')

Output(could be prettier), all in order:输出(可能更漂亮),全部按顺序:

A Story About Treating a Female Knig...
Tales of Demons and Gods
Martial Peak
Onepunch-Man
One Piece
Star Martial God Technique
Solo Leveling
The Last Human
Kimetsu no Yaiba
Versatile Mage
Boku no Hero Academia
Apotheosis
Black Clover
Tensei Shitara Slime Datta Ken
Kingdom
Tate no Yuusha no Nariagari
Tomo-chan wa Onna no ko!
Goblin Slayer
Yakusoku no Neverland
God of Martial Arts
Kaifuku Jutsushi no Yarinaoshi
Re:Monster
Mushoku Tensei - Isekai Ittara Honki...
Nanatsu no Taizai
Battle Through the Heavens
Shingeki no Kyojin
Iron Ladies
Monster Musume no Iru Nichijou
World’s End Harem
Bleach
Parallel Paradise
Shokugeki no Soma
Spirit Sword Sovereign
Horimiya
Dungeon ni Deai o Motomeru no wa Mac...
Dr. Stone
Berserk
The New Gate
Akatsuki no Yona
Naruto
Overlord
Death March kara Hajimaru Isekai Kyo...
Tsuki ga Michibiku Isekai Douchuu
Eternal Reverence
Minamoto-kun Monogatari
Beastars
Jujutsu Kaisen
Hajime no Ippo
Kaguya-sama wa Kokurasetai - Tensai-...
Domestic na Kanojo
The Legendary Moonlight Sculptor
The Gamer
Kumo desu ga, nani ka?
Bokutachi wa Benkyou ga Dekinai
Enen no Shouboutai
Tsuyokute New Saga
Fairy Tail
Komi-san wa Komyushou Desu.
Kenja no Mago
Soul Land
Boruto: Naruto Next Generations
Hunter X Hunter
History’s Strongest Disciple Kenichi
Phoenix against the World
LV999 no Murabito
Gate - Jietai Kare no Chi nite, Kaku...
Kengan Asura
Konjiki no Moji Tsukai - Yuusha Yoni...
Please don’t bully me, Nagatoro
Isekai Maou to Shoukan Shoujo Dorei ...
http://fmcdn.mfcdn.net/store/manga/27418/cover.jpg?token=64e5c0c930644528cba6eb2f2f5f5a2f3762188d&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/16627/cover.jpg?token=33f5ea4c1ba1a013c5bdcfdac87209fe472cf6d5&ttl=1616839200&v=1616396463
http://fmcdn.mfcdn.net/store/manga/27509/cover.jpg?token=ce2b16e8e867a8ce13ad0bee9940b68eef324cac&ttl=1616839200&v=1616737688
http://fmcdn.mfcdn.net/store/manga/11362/cover.jpg?token=1a5876d8a767fd27b26f0287bbb36eb82f9cf811&ttl=1616839200&v=1615796703
http://fmcdn.mfcdn.net/store/manga/106/cover.jpg?token=5313fc0dae53f33fcd1284cd4858603fc47ffa04&ttl=1616839200&v=1616748903
http://fmcdn.mfcdn.net/store/manga/22443/cover.jpg?token=89760754754a63efc875aa7e2de0536a5238bed3&ttl=1616839200&v=1616396922
http://fmcdn.mfcdn.net/store/manga/29037/cover.jpg?token=e8b496db4ad520f002040761c5887bc1e17af63a&ttl=1616839200&v=1616653683
http://fmcdn.mfcdn.net/store/manga/28343/cover.jpg?token=71c1b201e4d714f893efb7ac984c9787dd8df915&ttl=1616839200&v=1616748232
http://fmcdn.mfcdn.net/store/manga/19287/cover.jpg?token=803eb8beab4dc6aa8d73f5137a6e3331c0034d24&ttl=1616839200&v=1609900224
http://fmcdn.mfcdn.net/store/manga/27761/cover.jpg?token=6c11f2bddb31b460fccc9a158cc13b9593fb1ad2&ttl=1616839200&v=1616740672
http://fmcdn.mfcdn.net/store/manga/14356/cover.jpg?token=93638c7ec630de193299caa8d513e045818b35ce&ttl=1616839200&v=1616170144
http://fmcdn.mfcdn.net/store/manga/27118/cover.jpg?token=9c876792ad8e6e5f9777386184ea8e6f409aa9fd&ttl=1616839200&v=1616654344
http://fmcdn.mfcdn.net/store/manga/15291/cover.jpg?token=e0a3195fcc88e397703e8bdf6580a62a0d856816&ttl=1616839200&v=1616345844
http://fmcdn.mfcdn.net/store/manga/15975/cover.jpg?token=e07844bb607a3d53ababab51683ee6fa06906d7c&ttl=1616839200&v=1616733843
http://fmcdn.mfcdn.net/store/manga/8198/cover.jpg?token=bc135016049bb63e5b65ec87207e0c91bb0c62c8&ttl=1616839200&v=1616335864
http://fmcdn.mfcdn.net/store/manga/14036/cover.jpg?token=c13dab07379e88fb871d3d833999ead13bfaf0fc&ttl=1616839200&v=1615393923
http://fmcdn.mfcdn.net/store/manga/16159/cover.jpg?token=cdf538f92f729999bcb9fcae7fb31b7a8c306c92&ttl=1616839200&v=1569492366
http://fmcdn.mfcdn.net/store/manga/20569/cover.jpg?token=f9c08cde2f0a6bd646dc87dc4a8dee6fa44eca3c&ttl=1616839200&v=1616680427
http://fmcdn.mfcdn.net/store/manga/21271/cover.jpg?token=062fd439c18afaf178d3408c64b2b305f679e91a&ttl=1616839200&v=1611285077
http://fmcdn.mfcdn.net/store/manga/26916/cover.jpg?token=cda99bf9831ada1322045bf82893a9ed1ad868d5&ttl=1616839200&v=1615188784
http://fmcdn.mfcdn.net/store/manga/26841/cover.jpg?token=055e9ff117c28b3a7c3089c4d691228adeba1f55&ttl=1616839200&v=1616201299
http://fmcdn.mfcdn.net/store/manga/13895/cover.jpg?token=e7661738326d62d38b5f93771105898cb95adaba&ttl=1616839200&v=1612570263
http://fmcdn.mfcdn.net/store/manga/14217/cover.jpg?token=3263f009d5b42e441a09e14c44e3fd7d12a83089&ttl=1616839200&v=1615259584
http://fmcdn.mfcdn.net/store/manga/11374/cover.jpg?token=ab9d85a9efdd5b41391db5249bcf0011ce07070f&ttl=1616839200&v=1600762925
http://fmcdn.mfcdn.net/store/manga/14225/cover.jpg?token=e8912699841e28f9ca8b40eb8fe1d37d2a6ce3e3&ttl=1616839200&v=1616097340
http://fmcdn.mfcdn.net/store/manga/9011/cover.jpg?token=eaca757d4352b66d4ef69812ec5c265b5a2f7a28&ttl=1616839200&v=1614982324
http://fmcdn.mfcdn.net/store/manga/29235/cover.jpg?token=23b3338eaa8984bad9c17a2d604c60c909282715&ttl=1616839200&v=1614666974
http://fmcdn.mfcdn.net/store/manga/10348/cover.jpg?token=c4209cc06013a704c9f7a0e942b8ae55a7546941&ttl=1616839200&v=1616082423
http://fmcdn.mfcdn.net/store/manga/20107/cover.jpg?token=699e867d86e4957b8ef4d3eee5200f80cdbbea88&ttl=1616839200&v=1610529669
http://fmcdn.mfcdn.net/store/manga/9/cover.jpg?token=a4894a5ce212a490dda9c6cf73b717bbfbf015c3&ttl=1616839200&v=1616593028
http://fmcdn.mfcdn.net/store/manga/24693/cover.jpg?token=d968c24525bc6fe467f40c9ad2ff087ebfb60e4a&ttl=1616839200&v=1615325943
http://fmcdn.mfcdn.net/store/manga/11529/cover.jpg?token=1a3ab38ba3f212d5c95138bb690b155f38390aab&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/28001/cover.jpg?token=1769a66a83df9adfed58a36dc9275f202d1f8f37&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/11147/cover.jpg?token=e6d602fcd4b438ec299c955738487127cef7a3bf&ttl=1616839200&v=1616264399
http://fmcdn.mfcdn.net/store/manga/12978/cover.jpg?token=7e9094f238fcbd19717ffeeb4dcfe686a99dba4b&ttl=1616839200&v=1568611983
http://fmcdn.mfcdn.net/store/manga/24445/cover.jpg?token=0f77d7a743c0f613ff773f3e430f688e3aa77239&ttl=1616839200&v=1616345762
http://fmcdn.mfcdn.net/store/manga/176/cover.jpg?token=e8e87528092cd5b902767d7564e035486b8535f2&ttl=1616839200&v=1611297351
http://fmcdn.mfcdn.net/store/manga/14588/cover.jpg?token=469da1dfa4953459e08efdeb24561f78f7a68b47&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/9126/cover.jpg?token=53689bb06b90c163b58b0410e80252941b27aff6&ttl=1616839200&v=1616083893
http://fmcdn.mfcdn.net/store/manga/8/cover.jpg?token=8e5cbd08bd42f0684f36f107fc991c75b56bbed2&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/14765/cover.jpg?token=8a8e0582258d852b4c9d017567dd6820958f5a67&ttl=1616839200&v=1615042503
http://fmcdn.mfcdn.net/store/manga/16457/cover.jpg?token=7e59859f7af131902006c3eb8ed55745ef14573f&ttl=1616839200&v=1613139843
http://fmcdn.mfcdn.net/store/manga/16675/cover.jpg?token=cbb268f1326b704b1bb11accadc35ae3b7222e39&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/26261/cover.jpg?token=d83f514efe719b2dd301c2ecc8d672e9d935084c&ttl=1616839200&v=1613384403
http://fmcdn.mfcdn.net/store/manga/9518/cover.jpg?token=76170cb8b2defc468a817a69bf6e799900c4fd9f&ttl=1616839200&v=1596437944
http://fmcdn.mfcdn.net/store/manga/24547/cover.jpg?token=b99d7b791e14ec290054d57ead4dcf9fb61b4d7a&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/27861/cover.jpg?token=d14b0f3f2362869830c2971007e86ca43637bb85&ttl=1616839200&v=1616345044
http://fmcdn.mfcdn.net/store/manga/231/cover.jpg?token=53c2dc9eb6bf5c6f635de12496088a27b28e04f7&ttl=1616839200&v=1616418784
http://fmcdn.mfcdn.net/store/manga/17825/cover.jpg?token=f1b7954fba32d3146282b2b5bba4e1419578d65b&ttl=1616839200&v=1616677923
http://fmcdn.mfcdn.net/store/manga/14099/cover.jpg?token=7b7a61b4e544a65a75394e4cabf04831cf0c5d7a&ttl=1616839200&v=1611909666
http://fmcdn.mfcdn.net/store/manga/15177/cover.jpg?token=4442c2f4cf7e5c69d3449e7b358960930ff19e11&ttl=1616839200&v=1605145143
http://fmcdn.mfcdn.net/store/manga/13088/cover.jpg?token=d8ab36b3d0f4d9c6263a4f482f98c4d99809eb36&ttl=1616839200&v=1616641226
http://fmcdn.mfcdn.net/store/manga/18225/cover.jpg?token=ea670a4bc8d1aa0312f5427b24bf5702c12ef3a3&ttl=1616839200&v=1615470603
http://fmcdn.mfcdn.net/store/manga/23945/cover.jpg?token=9e078e0cb6da91194a6f86c814ae03922e8460d0&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/17045/cover.jpg?token=3026e40a21e490f37c656a778e9227c6c891cade&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/13930/cover.jpg?token=f773694a746e2015b4ca5c46afcc801d9795393c&ttl=1616839200&v=1616100123
http://fmcdn.mfcdn.net/store/manga/246/cover.jpg?token=3926211df393a0d50e58c0285c05f067c1ad64e5&ttl=1616839200&v=1615891989
http://fmcdn.mfcdn.net/store/manga/17189/cover.jpg?token=f9ffcf2a07bb8d1f7a49eac36c1f6c4fcd7e5622&ttl=1616839200&v=1616514627
http://fmcdn.mfcdn.net/store/manga/20299/cover.jpg?token=121f6571e072381a545e9e3790b4bf1723865859&ttl=1616839200&v=1615891671
http://fmcdn.mfcdn.net/store/manga/13841/cover.jpg?token=86245cf3afab622c35a41f4e2bf388ac48713906&ttl=1616839200&v=1615891672
http://fmcdn.mfcdn.net/store/manga/19939/cover.jpg?token=563a2963a0a153ac1c53779712f48af5630e0377&ttl=1616839200&v=1616714152
http://fmcdn.mfcdn.net/store/manga/44/cover.jpg?token=febabec452a05c1415f02bf8387a0a8f16c20137&ttl=1616839200&v=1548837372
http://fmcdn.mfcdn.net/store/manga/107/cover.jpg?token=3dcce47a3a6760b9b81b7b576711980d36cf7be1&ttl=1616839200&v=1543561843
http://fmcdn.mfcdn.net/store/manga/24241/cover.jpg?token=b4a1834d714f0476c2d99c5ffb905351c7a4d72f&ttl=1616839200&v=1616176266
http://fmcdn.mfcdn.net/store/manga/25773/cover.jpg?token=7bf8a8e9346a02250bb24cd8e6e4da0933e6a05f&ttl=1616839200&v=1616655977
http://fmcdn.mfcdn.net/store/manga/10956/cover.jpg?token=db3b74dc959adedbd847142cd3a079caca6b25d1&ttl=1616839200&v=1612043463
http://fmcdn.mfcdn.net/store/manga/15593/cover.jpg?token=caceb80b7266f438bdedae8cf69653ab7911fe68&ttl=1616839200&v=1606188363
http://fmcdn.mfcdn.net/store/manga/14916/cover.jpg?token=0dab5e6797f4cc915a035632ed0d02a2492afbcc&ttl=1616839200&v=1609752363
http://fmcdn.mfcdn.net/store/manga/26771/cover.jpg?token=77a6aa9bbb7ebcd3df15cd4cc65b4e3915e96ed4&ttl=1616839200&v=1615891829
http://fmcdn.mfcdn.net/store/manga/16569/cover.jpg?token=e5815ac1520ad179ad2d6f798e4b6ead6790cd33&ttl=1616839200&v=1614957071
http://fanfox.net/manga/a_story_about_treating_a_female_knight_who_has_never_been_treated_as_a_woman_as_a_woman/
http://fanfox.net/manga/tales_of_demons_and_gods/
http://fanfox.net/manga/martial_peak/
http://fanfox.net/manga/onepunch_man/
http://fanfox.net/manga/one_piece/
http://fanfox.net/manga/star_martial_god_technique/
http://fanfox.net/manga/solo_leveling/
http://fanfox.net/manga/the_last_human/
http://fanfox.net/manga/kimetsu_no_yaiba/
http://fanfox.net/manga/versatile_mage/
http://fanfox.net/manga/boku_no_hero_academia/
http://fanfox.net/manga/apotheosis/
http://fanfox.net/manga/black_clover/
http://fanfox.net/manga/tensei_shitara_slime_datta_ken/
http://fanfox.net/manga/kingdom/
http://fanfox.net/manga/tate_no_yuusha_no_nariagari/
http://fanfox.net/manga/tomo_chan_wa_onna_no_ko/
http://fanfox.net/manga/goblin_slayer/
http://fanfox.net/manga/yakusoku_no_neverland/
http://fanfox.net/manga/god_of_martial_arts/
http://fanfox.net/manga/kaifuku_jutsushi_no_yarinaoshi/
http://fanfox.net/manga/re_monster/
http://fanfox.net/manga/mushoku_tensei_isekai_ittara_honki_dasu/
http://fanfox.net/manga/nanatsu_no_taizai/
http://fanfox.net/manga/battle_through_the_heavens/
http://fanfox.net/manga/shingeki_no_kyojin/
http://fanfox.net/manga/iron_ladies/
http://fanfox.net/manga/monster_musume_no_iru_nichijou/
http://fanfox.net/manga/world_s_end_harem/
http://fanfox.net/manga/bleach/
http://fanfox.net/manga/parallel_paradise/
http://fanfox.net/manga/shokugeki_no_soma/
http://fanfox.net/manga/spirit_sword_sovereign/
http://fanfox.net/manga/horimiya/
http://fanfox.net/manga/dungeon_ni_deai_o_motomeru_no_wa_machigatte_iru_darou_ka/
http://fanfox.net/manga/dr_stone/
http://fanfox.net/manga/berserk/
http://fanfox.net/manga/the_new_gate/
http://fanfox.net/manga/akatsuki_no_yona/
http://fanfox.net/manga/naruto/
http://fanfox.net/manga/overlord/
http://fanfox.net/manga/death_march_kara_hajimaru_isekai_kyousoukyoku/
http://fanfox.net/manga/tsuki_ga_michibiku_isekai_douchuu/
http://fanfox.net/manga/eternal_reverence/
http://fanfox.net/manga/minamoto_kun_monogatari/
http://fanfox.net/manga/beastars/
http://fanfox.net/manga/jujutsu_kaisen/
http://fanfox.net/manga/hajime_no_ippo/
http://fanfox.net/manga/kaguya_sama_wa_kokurasetai_tensai_tachi_no_renai_zunousen/
http://fanfox.net/manga/domestic_na_kanojo/
http://fanfox.net/manga/the_legendary_moonlight_sculptor/
http://fanfox.net/manga/the_gamer/
http://fanfox.net/manga/kumo_desu_ga_nani_ka/
http://fanfox.net/manga/bokutachi_wa_benkyou_ga_dekinai/
http://fanfox.net/manga/enen_no_shouboutai/
http://fanfox.net/manga/tsuyokute_new_saga/
http://fanfox.net/manga/fairy_tail/
http://fanfox.net/manga/komi_san_wa_komyushou_desu/
http://fanfox.net/manga/kenja_no_mago/
http://fanfox.net/manga/soul_land/
http://fanfox.net/manga/boruto_naruto_next_generations/
http://fanfox.net/manga/hunter_x_hunter/
http://fanfox.net/manga/history_s_strongest_disciple_kenichi/
http://fanfox.net/manga/phoenix_against_the_world/
http://fanfox.net/manga/lv999_no_murabito/
http://fanfox.net/manga/gate_jietai_kare_no_chi_nite_kaku_tatakeri/
http://fanfox.net/manga/kengan_asura/
http://fanfox.net/manga/konjiki_no_moji_tsukai_yuusha_yonin_ni_makikomareta_unique_cheat/
http://fanfox.net/manga/please_don_t_bully_me_nagatoro/
http://fanfox.net/manga/isekai_maou_to_shoukan_shoujo_dorei_majutsu/

I found the best way (for me) with .find_all() / .findAll() methods is just to use for loop , same goes with .select() method.我发现使用.find_all() / .findAll()方法的最佳方法(对我来说)就是使用for loop ,与.select()方法一样。

And in some cases .select() giving better results.在某些情况下.select()给出更好的结果。 Check out SelectorGadget to quickly find css selector.查看SelectorGadget以快速找到css选择器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM