简体   繁体   中英

How to delete one kind of data from output CSV file created by BeautifuleSoup Python

<small class="truncate text-bold">Heart ...</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">hello</small>
<small class="truncate text-bold">trap</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">kick</small>
<small class="truncate text-bold">Fuse</small>

<small class="truncate text-bold blurple2">I 1</small>
<small class="truncate text-bold blurple2">I 2</small>
<small class="truncate text-bold blurple2">I 3</small>
<small class="truncate text-bold blurple2">I 4</small>
<small class="truncate text-bold blurple2">I 5</small>
<small class="truncate text-bold blurple2">I 6</small>
<small class="truncate text-bold blurple2">I 7</small>


for row in c_soup:
    s_c = row.find("small",{'class':'truncate text-bold'}).text.strip()
    s_i = row.find("small",{'class':'truncate text-bold blurple2'}).text.strip()
    

    print(s_i + ' ' + s_c)
  

My Output is

  1. I 1 Heart....
  2. I 2 Fuse
  3. I 3 Fuse
  4. I 4 hello
  5. I 5 trap
  6. I 6 Fuse
  7. I 7 kick
  8. I 8 Fuse

I don't want fuse in my output

  1. I 1 Heart....
  2. I 4 Hello
  3. I 5 trap
  4. I 7 kick

If I understand you correctly, you want to "zip" texts from <small> tags, but not ones that contain the word "Fuse":

from bs4 import BeautifulSoup

html_doc = '''<small class="truncate text-bold">Heart ...</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">hello</small>
<small class="truncate text-bold">trap</small>
<small class="truncate text-bold">Fuse</small>
<small class="truncate text-bold">kick</small>
<small class="truncate text-bold">Fuse</small>

<small class="truncate text-bold blurple2">I 1</small>
<small class="truncate text-bold blurple2">I 2</small>
<small class="truncate text-bold blurple2">I 3</small>
<small class="truncate text-bold blurple2">I 4</small>
<small class="truncate text-bold blurple2">I 5</small>
<small class="truncate text-bold blurple2">I 6</small>
<small class="truncate text-bold blurple2">I 7</small>
'''

soup = BeautifulSoup(html_doc, 'html.parser')

for a, b in zip( soup.select('.truncate:not(.blurple2)'), soup.select('.blurple2') ):
    if 'Fuse' in a.text:
        continue
    print(b.text + ' ' + a.text)

Prints:

I 1 Heart ...
I 4 hello
I 5 trap
I 7 kick

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM