簡體   English   中英

在python中迭代特定的csv行會輸出一個空白文件

[英]Iterating over specific csv rows in python outputs a blank file

python newb here - 我正在嘗試格式化一組非常粗略的csv我被發送,以便我可以將它們放入一個很好的postgres表中進行查詢和分析。 為了做到這一點,我首先使用csv.writer清除它們以刪除包裝每個條目的空行和雙引號。 這是我的代碼的樣子:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

它工作得很好,完全符合我的要求。 輸出csv看起來很棒。 接下來,我嘗試基本上從新清理的csv文件的開頭和結尾切掉包含完全不必要的垃圾的一定數量的行(省略前8行和后2行)。 由於我無法確定的原因,csv從代碼的這一部分輸出(縮寫與之前的'with'塊相同)完全為空:

with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
    writer2 = csv.writer(out2)
    reader2 = csv.reader(inp2)
    row_count = sum(1 for row in reader2)
    last_line_index = row_count - 3 
    for row in islice(reader2, 7, last_line_index):
            writer2.writerow(row)
    out2.close()

我知道,因為我的“帶”的使用,關閉()在每個塊的結尾是多余的-我想它作為一種方法后看這里 我還嘗試將第二個'with'塊放入另一個文件中並在運行第一個'with'塊后運行它,但仍無濟於事。 非常感謝您的幫助!

另外,這是整個文件:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
        writer2 = csv.writer(out2)
        reader2 = csv.reader(inp2)
        row_count = sum(1 for row in reader2)
        last_line_index = row_count - 3 
        for row in islice(reader2, 7, last_line_index):
                writer2.writerow(row)
        out2.close()

謝謝!

有罪的一方是

row_count = sum(1 for row in reader2)

它從reader2讀取所有數據; 現在,當您嘗試for row in islice(reader2, 7, last_line_index)您不會獲得任何數據。

此外,您可能正在閱讀大量空白行,因為您將文件打開為二進制文件; 相反

with open('file.csv', newline='') as inf:
    rd = csv.reader(inf)

您可以快速修復這樣的代碼(我對該問題進行了評論,正如@Hugh Bothwell所說,您已經讀取了變量reader2所有數據):

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
            writer2 = csv.writer(out2)
            reader2 = csv.reader(inp2)
            row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2
            last_line_index = row_count - 3 
            for row in islice(reader2, 7, last_line_index):
                    writer2.writerow(row)
            out2.close()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM