简体   繁体   English

Python如何从列表内的字符串中删除字符

[英]Python How to remove characters from a string inside a list

I've been playing around with my code for quite some time now. 我已经玩了很长时间的代码了。 I wanna replace a string of text from the values returned by the each_div variable which returns a whole bunch of parsed values from a webpage. 我想从each_div变量返回的值中替换一串文本,该变量从网页返回一大堆已解析的值。

def scrape_page():
    create_dir(project_dir)
    page = 1
    max_page = 10
    while page < max_page:
        page = page + 1
        for each_div in soup.find_all('div',{'class':'username'}):
            f.write(str(each_div) + "\n")

If I run this code it will parse data from the username class from a html page. 如果我运行此代码,它将解析html页面中用户名类的数据。 The problem is that it returns it like this: 问题是它会像这样返回它:

<div class="username">someone_s_username</div>

What I've been trying todo is strip the <div class="username"> and </div> part away so it would only return the actual username instead of the html. 我一直想做的是将<div class="username"></div>部分剥离掉,这样它只会返回实际的用户名而不是html。 If anyone have an idea on how to accomplish this that'll be terrific, thank you 如果有人对完成此操作有个好主意,谢谢

Sure, you can use Python's replace method: 当然,您可以使用Python的replace方法:

for each_div in soup.find_all('div',{'class':'username'}):
    each_div = each_div.replace('''<div class="username">''',"")
    each_div = each_div.replace("</div>","")
    f.write(str(each_div) + "\n")

Alternatively, you can split the string to obtain the part you want: 或者,您可以分割字符串以获得所需的部分:

for each_div in soup.find_all('div',{'class':'username'}):
    each_div = each_div.split(">")[1]  # everything after the first ">"
    each_div = each_div.split("<")[0]  # everything before the other "<"
    f.write(str(each_div) + "\n")

Oh, I just remembered, I believe you could be able to do simply this: 哦,我刚刚记得,我相信您可以做到这一点:

for each_div in soup.find_all('div',{'class':'username'}):
    f.write(str(each_div.text) + "\n")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM