简体   繁体   English

为什么只将列表的一部分写入csv文件?

[英]Why only a part of my list are write in a csv file?

I'm new to python trying to build my first script. 我是python的新手,它试图构建我的第一个脚本。 I want to scrap a list of url and export it in a csv file. 我想抓取网址列表并将其导出到csv文件中。

My script is well executed however when opening the csv file only few line of data are written. 我的脚本执行得很好,但是当打开csv文件时,只写了几行数据。 When I'm printing the list I'm trying to write ( sharelist and sharelist1 ), the print is complete whereas the csv file is not. 当我打印要尝试写的列表( sharelistsharelist1 )时,打印完成,而csv文件未完成。

Here is a part of my code : 这是我的代码的一部分:

  for url in urllist[10:1000]:  
                # query the website and return the html to the variable 'page'
        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404: # eheck the return code
                    continue
        soup = BeautifulSoup(page, 'html.parser')

                    # Take out the <div> of name and get its value
        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
          continue
        share = name_box.text.strip() # strip() is used to remove starting and trailing

        # save the data in tuple
        sharelist.append(url)
        sharelist1.append(share)

    # open a file for writing.
        csv_out = open('mycsv.csv', 'wb')

    # create the csv writer object.
        mywriter = csv.writer(csv_out)

    # writerow - one row of data at a time.
        for row in zip(sharelist, sharelist1):
            mywriter.writerow(row)

    # always make sure that you close the file.
    # otherwise you might find that it is empty.
        csv_out.close()

Not sure which part of my code I should share here. 不知道我应该在这里分享代码的哪一部分。 Please tell me if it's not enough ! 请告诉我是否还不够!

The problem is that you are opening the file every time you run through the loop. 问题是您每次循环运行时都打开文件。 This essentially will overwrite the previous file. 这实际上将覆盖先前的文件。

# open a file for writing.
    csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()

Either open the file before the loop, or open it with the append option. 可以在循环之前打开文件,或者使用append选项打开文件。

This is option one (note the indentation): 这是选项一(请注意缩进):

# open a file for writing.
csv_out = open('mycsv.csv', 'wb')

# create the csv writer object.
mywriter = csv.writer(csv_out)
for url in urllist[10:1000]:  
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip()

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()

This is option 2: 这是选项2:

for url in urllist[10:1000]:  
            # query the website and return the html to the variable 'page'
    try:
        page = urllib2.urlopen(url)
    except urllib2.HTTPError as e:
            if e.getcode() == 404: # eheck the return code
                continue
    soup = BeautifulSoup(page, 'html.parser')

                # Take out the <div> of name and get its value
    name_box = soup.find(attrs={'class': 'nb-shares'})
    if name_box is None:
      continue
    share = name_box.text.strip() # strip() is used to remove starting and trailing

    # save the data in tuple
    sharelist.append(url)
    sharelist1.append(share)

# open a file for writing.
    csv_out = open('mycsv.csv', 'ab')

# create the csv writer object.
    mywriter = csv.writer(csv_out)

# writerow - one row of data at a time.
    for row in zip(sharelist, sharelist1):
        mywriter.writerow(row)

# always make sure that you close the file.
# otherwise you might find that it is empty.
    csv_out.close()

The problem has been found and the best solution for files is to use with keyword which permits to close the file at the end automatically : 已发现问题,文件的最佳解决方案是with关键字一起使用with关键字允许最后自动关闭文件:

with open('mycsv.csv', 'wb') as csv_out:
    mywriter = csv.writer(csv_out)
    for url in urllist[10:1000]:  

        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404:
                    continue
        soup = BeautifulSoup(page, 'html.parser')

        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
          continue
        share = name_box.text.strip()

        # save the data in tuple
        sharelist.append(url)
        sharelist1.append(share)

        for row in zip(sharelist, sharelist1):
            mywriter.writerow(row)

Open a file for writing, using a context manager, that way, you don't need to close the file explicitly. 这样,可以使用上下文管理器打开文件进行写入,而无需显式关闭文件。

with open('mycsv.csv', 'w') as file_obj:
    mywriter = csv.writer(file_obj)
    for url in urllist[10:1000]:
        try:
            page = urllib2.urlopen(url)
        except urllib2.HTTPError as e:
                if e.getcode() == 404: # check the return code
                    continue
        soup = BeautifulSoup(page, 'html.parser')

        name_box = soup.find(attrs={'class': 'nb-shares'})
        if name_box is None:
            continue
        share = name_box.text.strip()
        # no need to use zip, and append in 2 lists as they're really expensive calls,
        # and by the looks of it, I think it'll create duplicate rows in your file
        mywriter.writerow((url, share))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM