简体   繁体   English

Python:使用来自多个嵌套for循环的数据编写csv

[英]Python : write csv with data from multiple nested for loop

I'm trying to scrap a website and write the data to a csv.我正在尝试废弃网站并将数据写入 csv。 The problem is that since I'm using nested loops, and all data doesn't end in the csv.问题是,由于我使用的是嵌套循环,并且所有数据都不会以 csv 结尾。

import requests 
from bs4 import BeautifulSoup
from csv import writer

with open ("full_links_details.csv", 'w', newline='') as csv_file: 
    csv_writer = writer(csv_file)
    csv_writer.writerow(["Details", "Details_Link", "image_link"])

    z = """
         <div class="container container1">
           <ul class="splist-view">
               <li class="wow fadeInUp">
                   <div class="row">
                       <div class="pic">
                           <a href="some_link_a">
                               <img src="some_image_link_a">
                           </a>
                       </div>
                       <div class="detail">
                           <ul>
                               <li class="hd"><a href="some_link_a">SomeTitleText-A</a></li>
                           </ul>
                       </div>
                   </div>
               </li>

               <li class="wow fadeInUp">
                   <div class="row">
                       <div class="pic">
                           <a href="some_link_b">
                               <img src="some_image_link_b">
                           </a>
                       </div>
                       <div class="detail">
                           <ul>
                               <li class="hd"><a href="some_link_b">SomeTitleText-B</a></li>
                           </ul>
                       </div>
                   </div>
               </li>
           </ul>
          </div>
        """
    souped_html_data = BeautifulSoup(z, "html.parser")

    div_detail_list = souped_html_data.find_all("div", "detail")
    div_pic_list = souped_html_data.find_all("div", "pic")

    for div_detail in div_detail_list:
        details = div_detail.get_text()


    for div_link in div_detail_list:
        div_link_a = div_link.find_all('a')
        for div_link_href in div_link_a:
            div_link_href_url = div_link_href.get('href')

    for div_pic in div_pic_list:
        div_pic_a = div_pic.find_all('img')
        for div_pic_a_src in div_pic_a:
            div_pic_a_src_link = div_pic_a_src.get('src')

        csv_writer.writerow([details, div_link_href_url, div_pic_a_src_link])

Now, whatever I do, I can't get all the data in correct form.现在,无论我做什么,我都无法以正确的形式获取所有数据。 I mean, if I change the indent of last line --- sometimes the details variable is repeated, sometimes first two fields are repeated.我的意思是,如果我更改最后一行的缩进 --- 有时会重复 details 变量,有时会重复前两个字段。 So, I'm sure that the nested For Loop is creating problem.所以,我确信嵌套的 For 循环会产生问题。 Is there any way to bring all loops to same level and then write the data?有没有办法将所有循环都带到同一级别然后写入数据? I think, that will solve the problem.我认为,这将解决问题。

Solved with help of @Joël.在@Joël 的帮助下解决了。 Clubbed all loops as:将所有循环球棒化为:

    for div_detail, div_link, div_pic in zip(div_detail_list, div_detail_list, div_pic_list):
        details = div_detail.get_text()

        div_link_a = div_link.find_all('a')
        for div_link_href in div_link_a:
            div_link_href_url = div_link_href.get('href')

        div_pic_a = div_pic.find_all('img')
        for div_pic_a_src in div_pic_a:
            div_pic_a_src_link = div_pic_a_src.get('src')

        csv_writer.writerow([details, div_link_href_url, div_pic_a_src_link])

In order to use items from several loops in parallel, you may use the built-in zip function:为了并行使用来自多个循环的项目,您可以使用内置的zip函数:

>>> seq0 = [1, 2, 3]
>>> seq1 = ['a', 'b', 'c']
>>> for s0, s1 in zip(seq0, seq1):
...    print(f"{s0} - {s1}") 
1 - a
2 - b
3 - c

However, something seems strange to me: you use three loops on three different item sets, but are you sure the three of them will be of same length?然而,我觉得有些奇怪:您在三个不同的项目集上使用了三个循环,但您确定它们三个的长度相同吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM