简体   繁体   English

Python,使用Selenium如何从char清除/ a0这样的char_page_source:

[英]Python, using Selenium how to clean page_source from char like /a0:

I am using selenium webdriver to get page source. 我正在使用Selenium WebDriver获取页面源。 But I get back a source filled with the /a0:, which I have read ment non-breaking space. 但是我找回了一个充满/ a0:的源,我已经读过不间断的空格。 So I was wondering: 所以我想知道:

A. How to get read of it, should I clean the source once I got it, or can I do anything in advance? 答:如何阅读它,一旦获得源代码,我应该清理源代码,还是可以提前做任何事情?

B. What reason is there to place it on the HTML in the first place, first time I encountered such a thing. B.第一次遇到这种情况是出于什么原因将其放在HTML上。

Example for code: 代码示例:

......<a0:div style="position: absolute; top: -1000px; height: 1px; width: 1px;">
<a0:object data="https://translate.googleapis.com/translate_static/js/element/hrs.swf" height="500"
id="fI0hpn482ja" name="fI0hpn482ja" type="application/x-shockwave-flash" width="400">
<a0:param name="allowScriptAccess" value="always"></a0:param></a0:object></a0:div>
<a0:iframe class="goog-te-menu-frame skiptranslate" frameborder="0" style="visibility:
visible; -moz-box-sizing: content-box; width: 731px; height: 274px; display: none;">
</a0:iframe></a0:body></a0:html></body></html>

thanks :) 谢谢 :)

1.You can repalce them with an empty string. 1.您可以用空字符串代替它们。 Common usage could be like this: 常见用法可能是这样的:

def get_clean_string(string, substring):
    while substring in string:
        string = string.replace(substring, '')
    return string

and the result: 结果:

In [24]: get_clean_string('replacemeHeresWhatINeed', 'replaceme')
Out[24]: 'HeresWhatINeed'

2.Maybe you should specify encoding in your source. 2.也许您应该在源代码中指定编码。 Python uses ASCII by default ( here ). Python默认使用ASCII( 在此处 )。 In my project I encounter russian chars all the time, so all my files are encoded to utf-8 in the first line 在我的项目中,我一直都遇到俄语字符,因此我的所有文件在第一行中都编码为utf-8

#-*- coding: utf-8 -*-

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM