[英]Python How to remove characters from a string inside a list
I've been playing around with my code for quite some time now. 我已经玩了很长时间的代码了。 I wanna replace a string of text from the values returned by the
each_div
variable which returns a whole bunch of parsed values from a webpage. 我想从
each_div
变量返回的值中替换一串文本,该变量从网页返回一大堆已解析的值。
def scrape_page():
create_dir(project_dir)
page = 1
max_page = 10
while page < max_page:
page = page + 1
for each_div in soup.find_all('div',{'class':'username'}):
f.write(str(each_div) + "\n")
If I run this code it will parse data from the username class from a html page. 如果我运行此代码,它将解析html页面中用户名类的数据。 The problem is that it returns it like this:
问题是它会像这样返回它:
<div class="username">someone_s_username</div>
What I've been trying todo is strip the <div class="username">
and </div>
part away so it would only return the actual username instead of the html. 我一直想做的是将
<div class="username">
和</div>
部分剥离掉,这样它只会返回实际的用户名而不是html。 If anyone have an idea on how to accomplish this that'll be terrific, thank you 如果有人对完成此操作有个好主意,谢谢
Sure, you can use Python's replace method: 当然,您可以使用Python的replace方法:
for each_div in soup.find_all('div',{'class':'username'}):
each_div = each_div.replace('''<div class="username">''',"")
each_div = each_div.replace("</div>","")
f.write(str(each_div) + "\n")
Alternatively, you can split the string to obtain the part you want: 或者,您可以分割字符串以获得所需的部分:
for each_div in soup.find_all('div',{'class':'username'}):
each_div = each_div.split(">")[1] # everything after the first ">"
each_div = each_div.split("<")[0] # everything before the other "<"
f.write(str(each_div) + "\n")
Oh, I just remembered, I believe you could be able to do simply this: 哦,我刚刚记得,我相信您可以做到这一点:
for each_div in soup.find_all('div',{'class':'username'}):
f.write(str(each_div.text) + "\n")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.