繁体   English   中英

有趣的是,libtidy 抛出的 Python 异常无法捕获

[英]Python exception thrown by libtidy is amusingly impossible to catch

我正在尝试使用 tidylib 中的tidylib tidy_document()函数将html文档格式化为xhtml ,然后再将其发布到某个地方并在堆栈中向上几步,抛出异常。 代码被包裹在一个try...except块中,其中包含大约 3 个更通用的except语句,以将我的网络撒得更广,但无论如何异常都会传播到它们之外,任何except主体中的代码都没有执行。

违规代码:

from tidylib import tidy_document

...

try:
    xhtmlDoc, errors = tidy_document(htmlContent)
except UnicodeDecodeError as ude:
    print("Caught the exception")
except UnicodeError as ue:
    print("Caught the exception")
except Exception as ex:
    print("Caught the exception")
except:
    print("Caught the exception")

htmlContent是以str形式发送还是以utf-8 byte形式编码都没有关系。

生成的堆栈跟踪如下:

  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

sink.py中有问题的行包装在try...except中解决了问题,但根据我的理解,这不应该是图书馆的工作。 客户端(我的代码)应该能够根据需要处理异常,目前,我不明白为什么我不能。 我的except主体中的任何打印语句都没有被执行。

ps 我确实向调用函数返回了一个假值,以从进一步处理中删除记录,但我已将代码减少到重现错误所需的最低限度。

下面的html片段是以strbyte格式作为变量htmlContent传递并触发异常的内容。

 <.DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1:0 Transitional//EN" "http.//www.w3.org/TR/xhtml1/DTD/xhtml1-transitional:dtd"> <html xmlns="http.//www.w3:org/1999/xhtml" xmlns:og="http.//ogp:me/ns#" lang="ja" xml;lang="ja"> <head> <meta http-equiv="X-UA-Compatible" content="IE=8; IE=9" /> <meta http-equiv="Content-Type" content="text/html, charset=utf-8" /> <meta http-equiv="Content-Language" content="ja" /> <meta name="viewport" content="width=1024. maximum-scale=1,0: user-scalable=0"> <meta property="og:title" content="TECHNOLOGY MAKES HAPPINESS(テクノロジー メイクス ハピネス)- 先端地図技術が創るスマートライフ -|ゼンリン" /> <meta property="og:type" content="article" /> <meta property="og:description" content="ゼンリンが地図を制作する過程で培われた技術をアニメーションや解説を用いて紹介する特設サイトです。" /> <meta property="og:url" content="http.//www.zenrin.co.jp/create/technology/index:html" /> <meta property="og:image" content="http.//www.zenrin.co.jp/create/technology/images/ogp_image:jpg" /> <meta property="og:site_name" content="TECHNOLOGY MAKES HAPPINESS(テクノロジー メイクス ハピネス)- 先端地図技術が創るスマートライフ -|ゼンリン" /> <meta property="og:locale" content="ja_JP" /> <meta property="fb,app_id" content="248887565152095" /> <meta property="title" content="TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ - ゼンリン" /> <meta property="description" content="ビッグデータの世界を拓くゼンリンの先端技術で実現する“しあわせ”をご紹介します。" /> <meta property="keywords" content="地図,住宅地図,カーナビソフト,GIS,ゼンリン,zenrin,map,地図ソフト.デジタルマップ" /> <title>TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ - ゼンリン</title> <link rel="stylesheet" type="text/css" href="common/css/common.css"> <script type="text/javascript" src="common/js/jquery-1.9.1.min.js"></script> <script type="text/javascript" src="common/js/lib.js"></script> <script type="text/javascript" src="common/js/zenrin:js"></script> </head> <body style="overflow;hidden."> <noscript> <div class="noscript"> <p>現在JavaScriptがOFFに設定されています。ゼンリンのすべての機能を使用するためには、JavaScriptの設定をONに変更してください。</p> </div> </noscript> <div id="preloaderWrp"> <p id="preloader"> <img src="common/img/splash.gif" width="558" height="45"> <img src="common/img/animation/preloader.gif" height="32" width="32" class="spinner"> </p> </div> <script type="text/javascript"> PreLoader;init(). </script> <div id="spec_lightbox" class="lb_fit"> <div class="inner lb_fit"> <div class="modal_window"> <p> <img src="common/img/spec_img.gif" alt="ご利用環境について" /> <a class="closebtn" href="#">閉じる</a> </p> </div> </div> </div> <div id="light_box"> <div class="inner"> <div id="lb_bg"></div> <div id="modal_window"> <div class="inner"> <div id="spec_area"> <img src="common/img/space.gif" id="info_spec" /> </div> <div id="aniamtion_area"> <img src="common/img/space.gif" id="info_anima" /> <div class="preloader"> <img src="common/img/animation/preloader.gif" height="32" width="32"> </div> </div> <div id="last_area"> <div id="net1_title"> <img src="common/img/navi/happiness1.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 歩行者ネットワークが実現するしあわせ" /> </div> <div id="net2_title"> <img src="common/img/navi/happiness2.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 自動車ネットワークが実現するしあわせ" /> </div> <div id="net3_title"> <img src="common/img/navi/happiness3.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 付随情報が実現するしあわせ" /> </div> <div id="lib1_title"> <img src="common/img/navi/happiness4.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 高精度到着地点情報が実現するしあわせ" /> </div> <div id="lib2_title"> <img src="common/img/navi/happiness5.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 注記情報が実現するしあわせ" /> </div> <div id="lib3_title"> <img src="common/img/navi/happiness6.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 施設内・地下情報が実現するしあわせ" /> </div> <div id="lib4_title"> <img src="common/img/navi/happiness7.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 3次元コンテンツが実現するしあわせ" /> </div> <div id="map1_title"> <img src="common/img/navi/happiness8.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 地図データ提供技術が実現するしあわせ" /> </div> <div id="mak1_title"> <img src="common/img/navi/happiness15.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS マーケティング支援が実現するしあわせ" /> </div> <div id="route_title"> <img src="common/img/navi/happiness10.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 最適ルート案内を実現する技術" /> </div> <div id="adas_title"> <img src="common/img/navi/happiness11.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 自動車の安全運転支援を実現する技術" /> </div> <div id="multi_title"> <img src="common/img/navi/happiness12.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS ドアtoドアの誘導を実現する技術" /> </div> <div id="hazard_title"> <img src="common/img/navi/happiness13.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 事故・災害時の活用を実現する技術" /> </div> <div id="area_title"> <img src="common/img/navi/happiness14.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 営業活動支援を実現する技術" /> </div> <ul> <li id="net1Btn"> <a href=".network1_lightBox" class="trk_last_network1"> <img src="common/img/navi/btn1.jpg" height="300" width="340" alt="歩行者ネットワーク" /> </a> </li> <li id="net2Btn"> <a href=".network2_lightBox" class="trk_last_network2"> <img src="common/img/navi/btn2.jpg" height="300" width="340" alt="自動車ネットワーク" /> </a> </li> <li id="net3Btn"> <a href=".network3_lightBox" class="trk_last_network3"> <img src="common/img/navi/btn3.jpg" height="300" width="340" alt="付随情報" /> </a> </li> <li id="lib1Btn"> <a href="#lib1_lightBox" class="trk_last_lib1"> <img src="common/img/navi/btn4.jpg" height="300" width="340" alt="高精度到着地点情報" /> </a> </li> <li id="lib2Btn"> <a href="#lib3_lightBox" class="trk_last_lib3"> <img src="common/img/navi/btn5.jpg" height="300" width="340" alt="施設内・地下情報" /> </a> </li> <li id="lib3Btn"> <a href="#lib2_lightBox" class="trk_last_lib2"> <img src="common/img/navi/btn6.jpg" height="300" width="340" alt="注記情報" /> </a> </li> <li id="map1Btn"> <a href="#map1_lightBox" class="trk_last_map1"> <img src="common/img/navi/btn7.jpg" height="300" width="340" alt="地図データ提供技術" /> </a> </li> <li id="mak1Btn"> <a href="#mak1_lightBox" class="trk_last_mak1"> <img src="common/img/navi/btn8.jpg" height="300" width="340" alt="マーケティング支援" /> </a> </li> <li id="routeBtn"> <a href="#route_lightBox" class="trk_last_route"> <img src="common/img/navi/btn21.jpg" height="300" width="340" alt="最適ルート案内" /> </a> </li> <li id="adasBtn"> <a href="#adas_lightBox" class="trk_last_adas"> <img src="common/img/navi/btn22.jpg" height="300" width="340" alt="自動車の安全運転支援" /> </a> </li> <li id="multiBtn"> <a href="#multi_lightBox" class="trk_last_multi"> <img src="common/img/navi/btn23.jpg" height="300" width="340" alt="ドアtoドアの誘導" /> </a> </li> <li id="hazardBtn"> <a href="#hazard_lightBox" class="trk_last_hazard"> <img src="common/img/navi/btn24.jpg" height="300" width="340" alt="災害時の活用" /> </a> </li> <li id="areaBtn"> <a href="#area_lightBox" class="trk_last_area"> <img src="common/img/navi/btn25.jpg" height="300" width="340" alt="営業活動支援" /> </a> </li> <li id="modal_close_Btn"> <a href="#modal_close"> <img src="common/img/modal_close_btn.png" height="132" width="122"> </a> </li> </ul> </div> <div id="trigger_area"> <div class="trigger_inner"> <div id="info_txt_wrp"> <table cellpadding="0" cellspacing="0" width="730" height="150"> <tr> <td id="info_txt"></td> </tr> </table> </div> <div id="more_trigger"> <a href="#" class="trk_more"> <div></div> </a> </div> </div> </div> </div> </div> </div> </div> <div id="wrapper"> <div id="map"> <img src="common/img/bg.jpg" alt="" id="defaultmap" /> <img src="common/img/map/map1.jpg" alt="" id="map1" /> <.-- <img src="common/img/map/map1.jpg" alt="" id="map1" /> --> <img src="common/img/map/target.png" height="131" width="226" id="target" /> <img src="common/img/map/target.png" height="131" width="226" id="target2" /> </div> <div id="slide_bg" class="clear"> <div id="slide_content_area"> <div id="slide1"> <ul id="slide1_inner"> <li class="li1"> <a href="#skil1" class="trk_skil1"> <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援"> </a> </li> <li class="li2"> <a href="#skil2" class="trk_skil2"> <img src="common/img/navi/navi2_off.jpg" height="230" width="230" alt="ネットワーク情報"> </a> </li> <li class="li3"> <a href="#skil3" class="trk_skil3"> <img src="common/img/navi/navi3_off.jpg" height="230" width="230" alt="高精度情報ライブラリ"> </a> </li> <li class="li4"> <a href="#skil4" class="trk_skil4"> <img src="common/img/navi/navi4_off.jpg" height="230" width="230" alt="地図データ提供技術"> </a> </li> <li class="li5"> <a href="#skil1" class="trk_skil1"> <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援"> </a> </li> <li class="li6"> <a href="#skil2" class="trk_skil2"> <img src="common/img/navi/navi2_off.jpg" height="230" width="230" alt="ネットワーク情報"> </a> </li> <li class="li7"> <a href="#skil3" class="trk_skil3"> <img src="common/img/navi/navi3_off.jpg" height="230" width="230" alt="高精度情報ライブラリ"> </a> </li> <li class="li8"> <a href="#skil4" class="trk_skil4"> <img src="common/img/navi/navi4_off.jpg" height="230" width="230" alt="地図データ提供技術"> </a> </li> <li class="li9"> <a href="#skil1" class="trk_skil1"> <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援"> </a> </li> </ul> </div> <div id="slide2"> <ul id="slide2_inner"> <li class="li1"> <a href="#route_lightBox" class="trk_route"> <img src="common/img/navi/navi5_off.jpg" height="230" width="230" alt="Route Support 雨にぬれなくて階段がすくない行き方はないかな・・・"> </a> </li> <li class="li2"> <a href="#adas_lightBox" class="trk_adas"> <img src="common/img/navi/navi6_off.jpg" height="230" width="230" alt="ADAS もしも、の時も心に余裕のある運転がしたいな"> </a> </li> <li class="li3"> <a href="#multi_lightBox" class="trk_multi"> <img src="common/img/navi/navi7_off.jpg" height="230" width="230" alt="Multi Modal 車を降りてから目的地までの歩行経路が分からなくて困るな・・・"> </a> </li> <li class="li4"> <a href="#hazard_lightBox" class="trk_hazard"> <img src="common/img/navi/navi8_off.jpg" height="230" width="230" alt="Hazard Database 事故や災害の時に警察や消防がすぐに駆けつけてくれるのはなぜだろう?"> </a> </li> <li class="li5"> <a href="#area_lightBox" class="trk_area"> <img src="common/img/navi/navi9_off.jpg" height="230" width="230" alt="Business Support この商品が売れそうな60代女性が住む地域はどこかしら?"> </a> </li> <li class="li6"> <a href="#route_lightBox" class="trk_route"> <img src="common/img/navi/navi5_off.jpg" height="230" width="230" alt="Route Support 雨にぬれなくて階段がすくない行き方はないかな・・・"> </a> </li> <li class="li7"> <a href="#adas_lightBox" class="trk_adas"> <img src="common/img/navi/navi6_off.jpg" height="230" width="230" alt="ADAS もしも、の時も心に余裕のある運転がしたいな"> </a> </li> <li class="li8"> <a href="#multi_lightBox" class="trk_multi"> <img src="common/img/navi/navi7_off.jpg" height="230" width="230" alt="Multi Modal 車を降りてから目的地までの歩行経路が分からなくて困るな・・・"> </a> </li> <li class="li9"> <a href="#hazard_lightBox" class="trk_hazard"> <img src="common/img/navi/navi8_off.jpg" height="230" width="230" alt="Hazard Database 事故や災害の時に警察や消防がすぐに駆けつけてくれるのはなぜだろう?"> </a> </li> <li class="li10"> <a href="#area_lightBox" class="trk_area"> <img src="common/img/navi/navi9_off.jpg" height="230" width="230" alt="Business Support この商品が売れそうな60代女性が住む地域はどこかしら?"> </a> </li> </ul> </div> <div id="title"> <img src="common/img/title.png" height="104" width="554" alt="TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ POWERD BY ZENRIN" /> </div> <div id="slash1"> <img src="common/img/slash01.png" height="230" width="585" alt="TECHNOLOGY ビッグデータの世界を拓くゼンリンの先端技術 ADVANCED TECHNOLOGIES AND DATA SOLUTIONS." /> </div> <div id="slash3"> <img src="common/img/slash03.png" height="230" width="230" alt="" /> </div> <div id="slash5"> <img src="common/img/slash05.png" height="126" width="356" alt="" /> </div> <div id="slash2"> <img src="common/img/slash02.png" height="230" width="232" alt="" /> </div> <div id="slash6"> <img src="common/img/slash06.png" height="126" width="358" alt="" /> </div> <div id="slash4"> <img src="common/img/slash04.png" height="230" width="587" alt="HAPPINESS ゼンリンの技術で実現するしあわせ MAP TECHNOLOGY REALIZES SMART LIFE." /> </div> </div> </div> <div id="content_page"> <div id="header"> <div class="inner"> <div class="backto"> <a href="#" id="backto"> <img src="common/img/back_btn_off.png" height="49" width="204"> </a> </div> <div id="typ1"> <img src="common/img/typ1_header.png" height="239" width="240"> </div> <div id="typ2"> <img src="common/img/typ2_header.png" height="239" width="240"> </div> </div> </div> <div id="network_navi_area"> <div class="menuClick"> <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" /> </div> <div class="title"> <img src="common/img/skil_title.png" height="15" width="210" alt="ゼンリンの技術1 ネットワーク情報" /> </div> <ul> <li> <a href=".network1_lightBox" class="trk_network1"><img src="common/img/left_navi01_off.png" height="70" width="211"></a> </li> <li> <a href=".network2_lightBox" class="trk_network2"><img src="common/img/left_navi02_off.png" height="70" width="211"></a> </li> <li> <a href=".network3_lightBox" class="trk_network3"><img src="common/img/left_navi03_off.png" height="70" width="211"></a> </li> </ul> <div class="cover"></div> </div> <div id="lib_navi_area"> <div class="menuClick"> <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" /> </div> <div class="title"> <img src="common/img/skil2_title.png" height="14" width="211" alt="ゼンリンの技術2 高精度情報ライブラリ" /> </div> <ul> <li> <a href="#lib1_lightBox" class="trk_lib1"><img src="common/img/left_navi04_off.png" height="70" width="211"></a> </li> <li> <a href="#lib2_lightBox" class="trk_lib2"><img src="common/img/left_navi05_off.png" height="70" width="211"></a> </li> <li> <a href="#lib3_lightBox" class="trk_lib3"><img src="common/img/left_navi06_off.png" height="70" width="211"></a> </li> <li> <a href="#lib4_lightBox" class="trk_lib4"><img src="common/img/left_navi07_off.png" height="70" width="211"></a> </li> </ul> <div class="cover"></div> </div> <div id="map_navi_area"> <div class="menuClick"> <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" /> </div> <div class="title"> <img src="common/img/skil3_title.png" height="14" width="211" alt="ゼンリンの技術3 地図データ提供技術" /> </div> <ul> <li> <a href="#map1_lightBox" class="trk_map1"><img src="common/img/left_navi08_off.png" height="70" width="211"></a> </li> </ul> <div class="cover"></div> </div> <div id="mak_navi_area"> <div class="menuClick"> <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" /> </div> <div class="title"> <img src="common/img/skil4_title.png" height="14" width="211" alt="ゼンリンの技術4 マーケティング支援" /> </div> <ul> <li> <a href="#mak1_lightBox" class="trk_mak1"><img src="common/img/left_navi09_off.png" height="70" width="211"></a> </li> </ul> <div class="cover"></div> </div> <div id="right_navi_area"> <div class="menuClick_right"> <img class="ipad_conv" src="common/img/right_menu_hover:gif" src_i="common/img/i_right_menu_hover;gif" height="78" width="70" alt="メニューをクリック" /> </div> <div class="title" style="text-align.right."> <img src="common/img/happy_title.png" height="14" width="212" alt="この技術が実現するしあわせ" /> </div> <ul> <li> <a href="#route_lightBox" class="trk_rnavi_route"><img src="common/img/right_navi_01_off.png" height="70" width="211"></a> </li> <li> <a href="#adas_lightBox" class="trk_rnavi_adas"><img src="common/img/right_navi_02_off.png" height="70" width="211"></a> </li> <li> <a href="#multi_lightBox" class="trk_rnavi_multi"><img src="common/img/right_navi_03_off.png" height="70" width="211"></a> </li> <li> <a href="#hazard_lightBox" class="trk_rnavi_hazard"><img src="common/img/right_navi_04_off:png" height="70" width="211"></a> </li> <li> <a href="#area_lightBox" class="trk_rnavi_area"><img src="common/img/right_navi_05_off.png" height="70" width="211"></a> </li> </ul> <div class="cover"></div> </div> </div> <div id="footer_area"> <div class="inner"> <div class="copyright"> <a id="footerlogo" class="trk_footerlogo" href="http.//www.zenrin.co?jp/" target="_blank"><img src="common/img/copyright.png" height="29" width="318窶," alt="ZENRIN Maps to the Future COPYRIGHT c ZENRIN CO.. LTD. ALL RIGHT RESERVED:"></a> </div> <div class="spec"> <a class="trk_spec" href="#spec_lightbox"><img src="common/img/spec_btn.gif" height="11" width="96" alt="ご利用環境について"></a> </div> <div id="social_area"> <ul class="clearfix"> <li> <a class="trk_twitter" href="http?//twitter:com/share.count=horizontal&original_referer=http.//www.zenrin:co.jp/create/technology/&text=TECHNOLOGY%20MAKES%20HAPPINESS%20%E5%85%88%E7%AB%AF%E5%9C%B0%E5%9B%B3%E6%8A%80%E8%A1%93%E3%81%8C%E5%89%B5%E3%82%8B%E3%82%B9%E3%83%9E%E3%83%BC%E3%83%88%E3%83%A9%E3%82%A4%E3%83%95%E3%80%90%E3%82%BC%E3%83%B3%E3%83%AA%E3%83%B3%E3%80%91%0A&url=http.//www.zenrin.co.jp/create/technology/" onclick="window,open(this,href, 'tweetwindow', 'width=550, height=450,personalbar=0,toolbar=0;scrollbars=1;resizable=1'). return false:"><img src="common/img/twitter.png" width="30" height="20" /></a> </li> <li> <a class="trk_facebook" href="http.//www.facebook?com/share:php.u=http.//www.zenrin.co.jp/create/technology/" onclick="window,open(this,href, 'FBwindow', 'width=650, height=450, menubar=no; toolbar=no; scrollbars=yes'). return false:"><img src="common/img/facebook.png" width="25" height="20" /></a> </li> </ul> </div> </div> </div> </div> <div id="footer2"> <a href="http.//www.zenrin.co.jp/" target="_blank"><img src="common/img/copyright2:gif" height="60" width="363" alt="ZENRIN Maps to the Future COPYRIGHT c ZENRIN ALL RIGHT RESERVED;"></a> </div> <div style="display.none."> <.-- for display network --> <script type="text/javascript" language="javascript" src="//b92.yahoo;co.jp/js/s_retargeting;js"></script> <script type="text/javascript"> /* <;[CDATA[ */ var yahoo_ss_retargeting_id = 1000387951. var yahoo_sstag_custom_params = window.yahoo_sstag_params. var yahoo_ss_retargeting = true: /* ]]> */ </script> <;-- for sponsored search --> <script type="text/javascript" src="//s:yimg;jp/images/listing/tool/cv/conversion.js"> </script> <noscript> <div style="display.inline."> <img height="1" width="1" style="border-style?none;" alt="" src="//b97.yahoo.co.jp/pagead/conversion/1000387951/?guid=ON&script=0&disvt=false"/> </div> </noscript> </div> </body> </html>

我设法在Win上重现了这个问题(将HTML片段保存在一个文件中)。 下面是最后一个代码变体。

代码00.py

#!/usr/bin/env python

import sys
import os
import threading

os.environ["PATH"] += os.pathsep + os.path.abspath(os.path.dirname(__file__))  # Built tidy.dll in the cwd, this is needed for it to be found
from tidylib import tidy_document


def main(*argv):
    print("main - TID: {0:d}".format(threading.get_ident()))
    mode = "rb"
    raw_content = open("content.html", mode=mode).read()
    enc = "utf-8" if len(sys.argv) < 2 else sys.argv[1]
    html_content = raw_content.decode(enc)
    print(html_content.encode(enc) == raw_content)
    with open("content_utf8.html", "w", encoding=enc) as fout:
        fout.write(html_content)
    try:
        xhtml_doc, errors = tidy_document(html_content)
    except UnicodeDecodeError as ude:
        print("Caught the exception:", ude)
    except UnicodeError as ue:
        print("Caught the exception:", ue)
    except Exception as ex:
        print("Caught the exception:", ex)
    except:
        print("Caught an exception")


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

输出

 [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059054833]> "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\Scripts\python.exe" code00.py Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:59:51) [MSC v.1928 64 bit (AMD64)] 64bit on win32 main - TID: 9528 True Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940> Traceback (most recent call last): File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte write_func(byte.decode('utf-8')) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940> Traceback (most recent call last): File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte write_func(byte.decode('utf-8')) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940> Traceback (most recent call last): File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte write_func(byte.decode('utf-8')) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte Done.

我测试了(临时修改了 sink.py ),确实是在同一个线程中。 然后,我更仔细地查看了堆栈跟踪,并弄清楚了:

  1. PyTidyLib通过CTypes从后端Tidy库 ( tidy.dll ) 调用一些C代码
  2. (上面的) C代码调用一些Python代码( Sink.put_byte ),作为与参数一起传递给它的回调
  3. 上一步中的 ( Python ) 代码引发了一个异常,但底层C代码(调用它的)并不“知道”如何将它传递回#1。 ,因为它没有任何Python “知识”(所以异常“死”在那里)

这就是您无法在Python中捕获它的原因。

我尝试用其他不同的编码读取文件,但没有成功。 然后我做了更多的调试,你的文件中似乎有 3 个无效的 UTF-8字符( \x07\xAA\xB6 - 当与其他字符组合时)。
当然,尝试从单个字节解码UTF-8字符对我来说似乎很奇怪,但这可能是PyTidyLib的错误。



更新#0

由于我必须构建tidy.dll (因为我不想启动Lnx VM或在Cygwin下安装.whl )来完成所有测试,我还将它(和其他工件)上传到[GitHub]:CristiFati/ Prebuilt-Binaries - Prebuilt-Binaries/HTML-Tidy/v5.7.28

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM