繁体   English   中英

使用Beautifulsoup Python从URL中抓取数据时获取编码文本

[英]Getting encoded text while scraping the data from URL using Beautifulsoup Python

代码部分:

[<div class="hidden_elem"><code id="u_0_8"><!-- <div class="_4-u2 _5z71 _18ib _4-u8"><div class="_4-u3 _5z73"><div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">560 \u091c\u093e \u0930\u0939\u0947 \u0939\u0948\u0902&nbsp;\xb7&nbsp;3.1 \u0939\u091c\u093c\u093e\u0930 \u0915\u0940 \u0930\u0941\u091a\u093f \u0939\u0948</a><div class="_5z7d">\u0907\u0938 \u0908\u0935\u0947\u0902\u091f \u0915\u094b \u0905\u092a\u0928\u0947 \u092e\u093f\u0924\u094d\u0930\u094b\u0902 \u0938\u0947 \u0938\u093e\u091d\u093e \u0915\u0930\u0947\u0902</div></div><a class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" role="button" href="#" ajaxify="#" rel="dialog" data-testid="event_invite_button"><i class="_3-8_ _3-8_ img sp_WYmAGAVQNZh sx_82e44d"></i>\u0906\u092e\u0902\u0924\u094d\u0930\u093f\u0924 \u0915\u0930\u0947\u0902</a></div></div></div> --></code></div>, <div class="hidden_elem"><code id="u_0_i"><!-- <div class="_5vl5 _3a9j"><ul class="uiList _4kg _4ks"><li class="_3slj"><div class="_36hm"><table class="uiGrid _51mz" cellspacing="0" cellpadding="0"><tbody><tr class="_51mx"><td class="_51m- _phw"><div class="_6a" aria-hidden="true"><div class="_6a _6b" style="height:18px"></div><div class="_6a _6b"><i class="_ohg img sp_ESbkBsVlxUv sx_c2b8bd"><u>clock</u></i></div></div></td><td class="_51m- _4930 _phw _51mw"><div class="_xkh _phw"><div class="_6a"><div class="_6a _6b" style="height:18px"></div><div class="_6a _6b"><div class="_publicProdFeedInfo__timeRowTitle _5xhk" content="2017-07-28T21:30:00-07:00 to 2017-07-29T05:00:00-07:00"><span><span itemprop="startDate">29 \u091c\u0941\u0932\u093e\u0908</span></span> <span title="09:30 &#x905;&#x92a;&#x930;&#x93e;&#x939;&#x94d;&#x928; &#x906;&#x92a;&#x915;&#x947; &#x938;&#x92e;&#x92f; &#x92e;&#x947;&#x902;">10:00 \u092a\u0942\u0930\u094d\u0935\u093e\u0939\u094d\u0928</span> - <span title="05:00 &#x92a;&#x942;&#x930;&#x94d;&#x935;&#x93e;&#x939;&#x94d;&#x928; &#x906;&#x92a;&#x915;&#x947; &#x938;&#x92e;&#x92f; &#x92e;&#x947;&#x902;">05:30 \u0905\u092a\u0930\u093e\u0939\u094d\u0928 UTC+05:30</span></div><div class="_5xhp fsm fwn fcg"></div></div></div></div></td></tr></tbody></table></div></li><li class="_3xd0 _3slj"><div class="_36hm _5cmn" id="u_0_9"><table class="uiGrid _51mz" cellspacing="0" cellpadding="0"><tbody><tr class="_51mx"><td class="_51m- _phw"><div class="_6a" aria-hidden="true"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><i class="_ohg img sp_ESbkBsVlxUv sx_f4bee6"><u>pin</u></i></div></div></td><td class="_51m- _51mw"><div class="clearfix _4930"><div class="_xkg _phw rfloat _ohf"><div><div id="u_0_a"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a href="#" role="button">\u092e\u0948\u092a \u0926\u093f\u0916\u093e\u090f\u0901</a></div></div></div><div class="hidden_elem" id="u_0_b"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a href="#" role="button">\u092e\u0948\u092a \u091b\u093f\u092a\u093e\u090f\u0901</a></div></div></div></div></div><div class="_xkh _phw _42ef"><div class="_6a"><div class="_6a _6b" style="height:32px"></div><div class="_6a _6b"><a class="_5xhk" href="https://www.facebook.com/iitd.delhi/" id="u_0_d" data-testid="event-permalink-location">IIT Delhi</a><div class="_5xhp fsm fwn fcg">Hauz Khaz, New Delhi, India 110016</div></div></div></div></div></td></tr></tbody></table></div><div class="_4-u2 hidden_elem _5xhn _4-u8" id="u_0_c"><div class="clearfix _ikh"><div class="_4bl7"><div class="_23mo"><div class="fbPlaceFlyoutWrap _5xho" id="u_0_e"><div class="fbPlaceFlyout" style="width:240px;"><div class="fbPlaceFlyoutShell" style="width:46px;bottom:-31px;"><div class="fbPlaceFlyoutBox uiBoxWhite" style="width: 46px"><div><div class="_52i5"><a href="https://www.facebook.com/iitd.delhi/"><img class="_s0 img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p40x40/255575_512250575469178_612128240_n.jpg?oh=dc9acf8d4452db344aaba7fde25efa84&amp;oe=59AD9DC7" alt="" itemprop="image" aria-label="IIT Delhi" role="img" style="width:40px;height:40px" /></a></div></div><div class="fbPlaceFlyoutMapArrow"><i class="img sp_ESbkBsVlxUv sx_104d97"></i></div><div class="fbPlaceFlyoutMapArrow"><i class="img sp_ESbkBsVlxUv sx_104d97"></i></div></div></div></div><a href="#" rel="dialog" ajaxify="/places/map/?id=211928345501404" role="button"><div><div class="_4j7v _2vs2"><img class="_a3f img" alt="" aria-label="&#x928;&#x915;&#x94d;&#x936;&#x93e; &#x905;&#x91f;&#x948;&#x91a;&#x92e;&#x947;&#x902;&#x91f;" src="https://external.fdel6-1.fna.fbcdn.net/static_map.php?region=IN&amp;v=29&amp;osm_provider=2&amp;size=240x132&amp;center=28.545188216208%2C77.193069476906&amp;zoom=15&amp;markers=28.54518822%2C77.19306948&amp;language=hi_IN" width="240" height="132" /><span id="u_0_g"></span></div></div></a></div></div></div><div class="_4bl9 _2qsg"><div><span class="_c24">\u0915\u0949\u0932\u0947\u091c \u0914\u0930 \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940</span><div><div class="_4iae"><div><div class="_6a _5xoz _5xo-"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div><div class="_6a _5xoz _4ial"><i class="img sp_ESbkBsVlxUv sx_ac5297"></i></div></div><div class="_559j" style="clip: rect(0px, 63px, 16px, 0px)"><div class="_6a _5xoz _5xo-"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div><div class="_6a _5xoz _4ial"><i class="img sp_ESbkBsVlxUv sx_59de11"></i></div></div></div></div><hr class="_23mm" /><div><span class="_c24">011 2659 6316</span></div><div><span class="_c24"></span></div><div class="ptm"><a class="_42ft _4jy0 _4jy3 _517h _51sy" role="button" href="http://l.facebook.com/l.php?u=http%3A%2F%2Fshare.here.com%2Fr%2Fmylocation%2Fe-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9%3Flink%3Dunknown%26fb_locale%3Dhi_IN%26ref%3Dfacebook&amp;h=ATP2RoDOmV19cipyFvxN_S_G4uI7FP1aDGQXs8I8palbouMF9Ut2wIJBE-D0XSb9O2x9_YcBTP1eLGOs-qvz3hHjCMi-5oGqGiE1TJerNdX-KKhRgc6j392SdLAY&amp;s=1" id="u_0_f" target="_blank" rel="nofollow" onmouseover="LinkshimAsyncLink.swap(this, &quot;http:\\\\/\\\\/share.here.com\\\\/r\\\\/mylocation\\\\/e-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9?link=unknown&amp;fb_locale=hi_IN&amp;ref=facebook&quot;);" onclick="LinkshimAsyncLink.swap(this, &quot;http:\\\\/\\\\/l.facebook.com\\\\/l.php?u=http\\\\u00253A\\\\u00252F\\\\u00252Fshare.here.com\\\\u00252Fr\\\\u00252Fmylocation\\\\u00252Fe-eyJuYW1lIjoiSUlUIERlbGhpIiwiYWRkcmVzcyI6IkhhdXogS2hheiwgTmV3IERlbGhpLCBJbmRpYSAxMTAwMTYiLCJsYXRpdHVkZSI6MjguNTQ1MTg4MjE2MjA4LCJsb25naXR1ZGUiOjc3LjE5MzA2OTQ3NjkwNiwicHJvdmlkZXJOYW1lIjoiZmFjZWJvb2siLCJwcm92aWRlcklkIjoyMTE5MjgzNDU1MDE0MDR9\\\\u00253Flink\\\\u00253Dunknown\\\\u002526fb_locale\\\\u00253Dhi_IN\\\\u002526ref\\\\u00253Dfacebook&amp;h=ATP2RoDOmV19cipyFvxN_S_G4uI7FP1aDGQXs8I8palbouMF9Ut2wIJBE-D0XSb9O2x9_YcBTP1eLGOs-qvz3hHjCMi-5oGqGiE1TJerNdX-KKhRgc6j392SdLAY&amp;s=1&quot;);">\u0926\u093f\u0936\u093e\u090f\u0901 \u092a\u094d\u0930\u093e\u092a\u094d\u0924 \u0915\u0930\u0947\u0902</a></div></div></div></div></div></li></ul><div id="event_navigation" class="_4dn9"><div id="u_0_h"></div></div></div> --></code></div>, <div class="hidden_elem"><code id="u_0_m"><!-- <div class="_4z-v"><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">\u0935\u093f\u0935\u0930\u0923<span class="_c1c"></span></span><div class="_3s3-"></div></div><div class="_2qgs"><span class="_4n-j _fbReactionComponent__eventDetailsContentTags fsl" data-testid="event-permalink-details">Indian Youth Forum is proud to announce the first-ever Startup Festival 2017 which will bring together the brightest startups of the country all in one place. And these startups are looking to hire you!<br /> For the first time ever, these bright and young startups, will open their ships to technical and non-technical talent, on an adventurous voyage filled with learning to become the next big company. The event is open to working professionals and talented freshers looking for a challenging and enriching role.<br /> <br /> For Any Kind of Association Queries Mail us at -<br /> mystory&#064;indiayf.in or Inbox us .</span></div><div class="_1r51"><ul class="uiList uiCollapsedList uiCollapsedListHidden _509- _4ki" id="u_0_j"><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22StartUp%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B181836542181749%5D%7D"><span class="_47od">StartUp</span></a></li><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22Job+hunting%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B111193155571103%5D%7D"><span class="_47od">Job hunting</span></a></li><li><a href="/events/discovery/?acontext=%7B%22ref%22%3A51%2C%22source%22%3A1%2C%22action_history%22%3A%22%5B%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22surface%5C%22%2C%5C%22extra_data%5C%22%3A%5B%5D%7D%2C%7B%5C%22surface%5C%22%3A%5C%22permalink%5C%22%2C%5C%22mechanism%5C%22%3A%5C%22event_information%5C%22%2C%5C%22extra_data%5C%22%3A%7B%5C%22tag%5C%22%3A%5C%22Startup.com%5C%22%7D%7D%5D%22%2C%22has_source%22%3Atrue%7D&amp;suggestion_token=%7B%22tags%22%3A%5B109416335743992%5D%7D"><span class="_47od">Startup.com</span></a></li></ul></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">Indian Youth Forum \u0915\u0947 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902<span class="_c1c"></span></span><div class="_3s3-"></div></div><div><div><div class="_37p5"><div class="clearfix"><img class="_37p7 _8o _8r lfloat _ohe img" height="100" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-0/c5.0.100.100/p100x100/16708216_1083815345075324_1809238266151282211_n.jpg?oh=cdc9096728fec80a0147133a6b1599d6&amp;oe=59E5EFDB" alt="" /><div class="_8u _42ef"><div class="_37p8"><div class="_50f4"><span class="fwb"><a class="profileLink" href="https://www.facebook.com/IyfIndianyouthforum/">Indian Youth Forum</a></span></div><div class="_37p9 _50f3">News &amp; Media Website</div><div class="_37pa _50f3">We find and tell stories of people doing good to inspire global action. Because we&#039;re convinced each of us has the power to make the world better .</div></div></div></div></div></div></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4-u3 _5dwa _5dwb _57_-"><span class="_38my _5803">\u0938\u094d\u0925\u093e\u0928 \u0915\u0947 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902<span class="_c1c"></span></span><div class="_3s3-"></div></div><div class="_37p6"><div><div><div><div class="_4sdm _6lh _dcs"><div class="_5hv6"><div class="_6lp"><div class="_6ln fsxxl fwb"><a href="https://www.facebook.com/iitd.delhi/" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;">IIT Delhi</a></div><div class="_6lo ellipsis fsm fwn fcg">\u0915\u0949\u0932\u0947\u091c \u0914\u0930 \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940</div></div></div><div class="uiScaledImageContainer _6li _6l-" style="width:100%"><img class="scaledImageFitWidth img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-0/p320x320/1660351_782270428467190_610794429_n.jpg?oh=4b4957698cf37eaa2621307fc3c61b8f&amp;oe=59E14DBB" style="top:-60px;" alt="&#039;Picture credit: Arshad Nasser (2013JDS6003) M.Des- Industrial Design&#039;" width="480" height="320" /></div><a class="_8xh" href="https://www.facebook.com/iitd.delhi/" style="width:100%" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;"></a><a class="_3aml" href="https://www.facebook.com/iitd.delhi/" style="width:100%"></a><div class="clearfix _5kun"><a class="_6ll lfloat _ohe" href="https://www.facebook.com/iitd.delhi/" data-ft="&#123;&quot;tn&quot;:&quot;k&quot;&#125;"><div class="_6lm _4m78"><div class="uiScaledImageContainer profilePic" style="width: 96px; height: 96px"><img class="scaledImageFitWidth img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p100x100/255575_512250575469178_612128240_n.jpg?oh=e2bf449617f68eac2b8cd02d7c35a513&amp;oe=59A0C926" alt="IIT Delhi" width="96" height="96" /></div></div></a><div class="_6lk _42ef"><div><div class="_8yb"><div>2,82,390 \u092a\u0938\u0902\u0926</div><div>2,019 \u0932\u094b\u0917 \u0907\u0938 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902 \u092c\u093e\u0924 \u0915\u0930 \u0930\u0939\u0947 \u0939\u0948\u0902</div></div></div></div></div></div></div></div></div></div><div class="_4z-w"><a class="_4b4x" href="https://www.facebook.com/iitd.delhi/" id="u_0_k">\u092a\u0947\u091c \u092a\u0930 \u091c\u093e\u090f\u0901</a></div></div><div class="_4-u2 _3xaf _3-95 _4-u8"><div class="_4x0f"><div class="_4x0g"><div class="_4x0d _4x0e"><div class="_41dr _4x0c"><span><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/c4.15.32.32/p40x40/15747342_1195628017184471_1949447432837553984_n.jpg?oh=54f25e123a74d63f279279ee62318a79&amp;oe=59B5B106" alt="" aria-label="Jha Ayush" role="img" /></span></div><div class="_41dr _4x0c"><a href="https://www.facebook.com/IyfIndianyouthforum/"><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15541314_1041942845929241_1722198877754933119_n.jpg?oh=973e318ede53168d58f6e7be835583c0&amp;oe=59A926CC" alt="" aria-label="Indian Youth Forum" role="img" /></a></div><div class="_41dr _4x0c"><a href="https://www.facebook.com/kumeshyadav"><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15337627_10153988267585286_2118657580809154297_n.jpg?oh=182fa980f18ed2d94c6717f8de3af7ad&amp;oe=599BC3CD" alt="" aria-label="Kumesh Yadav" role="img" /></a></div><div class="_41dr _4x0c"><span><img class="_s0 _41ds _54ru img" src="https://scontent.fdel6-1.fna.fbcdn.net/v/t1.0-1/p32x32/15965812_10158191872490352_4833263074795798396_n.jpg?oh=ce18a15878fc5814539a57aed4c0446b&amp;oe=59A47E1F" alt="" aria-label="Kanika Gupta" role="img" /></span></div></div></div><div class="_4x0h">\u091a\u0930\u094d\u091a\u093e \u092e\u0947\u0902 12 \u092a\u094b\u0938\u094d\u091f.</div></div><div class="_4z-w"><a class="_4b4x" href="/events/1407771472571452/?active_tab=discussion" id="u_0_l">\u091a\u0930\u094d\u091a\u093e \u0926\u0947\u0916\u0947\u0902</a></div></div></div> --></code></div>]

上面是代码的一部分,我需要在div class ='_ publicProdFeedInfo__timeRowTitle _5xhk'中抓取文本,因为我正在抓它显示如下的编码文本:

<div class="_publicProdFeedInfo__timeRowTitle _5xhk" content="2017-07-28T21:30:00-07:00 to 2017-07-29T05:00:00-07:00"><span><span itemprop="startDate">29 जुलाई</span></span> <span title="09:30 अपराह्न आपके समय में">10:00 पूर्वाह्न</span> - <span title="05:00 पूर्वाह्न आपके समय में">05:30 अपराह्न UTC+05:30</span></div>

虽然文本存在于网址的源代码中: https//www.facebook.com/events/1407771472571452/

你能告诉我怎样才能解决它

这是我正在使用的python代码

import urllib2
from bs4 import BeautifulSoup
facebook="https://www.facebook.com/events/1407771472571452/"
page = urllib2.urlopen(facebook)
soup = BeautifulSoup(page, 'lxml')
data = soup.findAll("div", {"class": "hidden_elem"})
for item in data:
             commentedHTML = item.find('code').contents[0]
             more_soup = BeautifulSoup(commentedHTML, 'lxml')
             wanted_text = more_soup.findAll('div', {'class': '_publicProdFeedInfo__timeRowTitle _5xhk'})
             if wanted_text:
                gotdata2 = (wanted_text[0])

                print gotdata2

读取响应后,从UTF-8解码:

page = urllib2.urlopen(facebook)
soup = BeautifulSoup(page.read().decode('utf-8', 'ignore'), 'lxml)

注意:添加了ignore以避免由于存在的无效UTF-8字符而导致失败,并且在解析时将删除这些字符。

识别div元素,然后识别其中的code元素。 注释可以作为此codestring ,并且可以传递以解析为BeautifulSoup。 一旦你有另外一个由评论内容组成的汤,你可以像处理其他任何一样来处理它。

>>> import bs4
>>> import requests
>>> page = requests.get('https://www.facebook.com/events/1407771472571452/').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> div = soup.find('div', attrs={'class':"hidden_elem"})
>>> code = div.find('code')
>>> soup_2 = bs4.BeautifulSoup(code.string, 'lxml')
>>> soup_2.findAll('a')
[<a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">601 Going · 3.3K Interested</a>, <a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a>]

编辑:如果我按照评论中的建议执行此操作,则会显示。

>>> divs_2 = soup_2.findAll('div')
>>> for item in divs_2:
...     item.contents
...     
[<div class="_4-u3 _5z73"><div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div><a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a></div></div>]
[<div class="clearfix"><div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div><a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a></div>]
[<div class="lfloat _ohe"><a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a><div class="_5z7d">Share this event with your friends</div></div>, <a ajaxify="#" class="_42ft _4jy0 _i8v _3-8w rfloat _ohf _4jy4 _517h _51sy" data-testid="event_invite_button" href="#" rel="dialog" role="button"><i class="_3-8_ _3-8_ img sp__Uck8Egf9Z1 sx_deb798"></i>Invite</a>]
[<a class="_5z74" href="/events/dialog/public_guest_list/?acontext%5Bref%5D=51&amp;acontext%5Bsource%5D=1&amp;acontext%5Baction_history%5D=%5B%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22surface%22%2C%22extra_data%22%3A%5B%5D%7D%2C%7B%22surface%22%3A%22permalink%22%2C%22mechanism%22%3A%22guest_list%22%2C%22extra_data%22%3A%5B%5D%7D%5D&amp;acontext%5Bhas_source%5D=1&amp;event_id=1407771472571452" rel="dialog" role="button">602 Going · 3.3K Interested</a>, <div class="_5z7d">Share this event with your friends</div>]
['Share this event with your friends']

对我来说,更简单的情况可能是尝试用英语请求页面,以避免翻译用其他语言编码的字符串。 我没有这方面的经验,但您可能会尝试调查requestsurllib2可用的选项来发出这样的请求。

最后经过许多尝试我通过在请求标题中指定语言来修复它:

url:https://www.facebook.com/events/1407771472571452/
headers = {"Accept-Language": "en-US,en;q=0.5"}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text,'lxml')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM