简体   繁体   English

将过滤后的CSV文件写入新文件并遍历文件夹

[英]Writing a filtered CSV file to a new file and iterating through a folder

I have been trying initially to create a program to go through one file and select certain columns that will then be moved to a new text file. 我最初一直在尝试创建一个程序来遍历一个文件并选择某些列,然后将其移至新的文本文件。 So far I have 到目前为止,我有

    import os, sys, csv
    os.chdir("C://Users//nelsonj//Desktop//Master_Project")
    with open('CHS_2009_test.txt', "rb") as sitefile:
    reader = csv.reader(sitefile, delimiter=',')
    pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]

    for row in reader:
        new_cols = list(row[i] for i in pref_cols)
        print new_cols

I have been trying to use the csv functions to write the new file but I am continuosly getting errors. 我一直在尝试使用csv函数编写新文件,但是我一直在出错。 I will eventually need to do this over a folder of files, but thought I would try to do it on one before tackling that. 我最终将需要在一个文件文件夹中执行此操作,但是我想在解决该问题之前会尝试在一个文件上执行此操作。

Code I attempted to use to write this data to a new file 我试图用来将数据写入新文件的代码

    for row in reader:
        with open("CHS_2009_edit.txt", 'w') as file:
            new_cols = list(row[i] for i in pref_cols)
            newfile = csv.writer(file)
            newfile.writerows(new_cols)

This kind of works in that I get a new file, but in only prints the second row of values from my csv, ie, not the header values and places commas in between each individual character, not just copying over the original columns as they were. 这种工作方式是,我得到一个新文件,但只从csv打印第二行值,即不打印标题值,并在每个单独的字符之间放置逗号,而不仅仅是复制原始列。

I am using PythonWin with Python 2.6(from ArcGIS) 我正在将PythonWin与Python 2.6(来自ArcGIS)一起使用

Thanks for the help! 谢谢您的帮助!

NEW UPDATED CODE 新的更新代码

   import os, sys, csv

   path = ('C://Users//nelsonj//Desktop//Master_Project')

   for filename in os.listdir(path):

       pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]
       with open(filename, "rb") as sitefile:
           with open(filename.rsplit('.',1)[0] + "_Master.txt", 'w') as output_file:
               reader = csv.reader(sitefile, delimiter=',')
               writer = csv.writer(output_file)
               for row in reader:
                   new_row = list(row[i] for i in pref_cols)
                   writer.writerow(new_row)
                   print new_row

Getting list index out of range for the new_row, but it seems to still be processing the file. 正在使列表索引超出new_row的范围,但似乎仍在处理文件。 Only thing I can't get it to do now is loop through all files in my directory. 我现在无法做的唯一事情就是遍历目录中的所有文件。 Here's a hyperlink to Screenshot of data text file 是数据文本文件的屏幕快照的超链接

Try this: 尝试这个:

 new_header = list(row[i] for i in pref_cols if i in row)

That should avoid the error, but it may not avoid the underlying problem. 那应该避免该错误,但是可能不能避免潜在的问题。 Would you paste your CSV file somewhere that I can access, and I'll fix this for you? 您可以将CSV文件粘贴到我可以访问的位置,然后为您修复该文件吗?

For your purpose of filtering, you don't have to treat the header differently from the rest of the data. 出于过滤的目的,您不必将标头与其余数据区别对待。 You can go ahead remove the following block: 您可以继续删除以下代码块:

    headers = reader.next()
    for row in headers:
        new_header = list(row[i] for i in pref_cols)
        print new_header  

Your code did not work because you treated headers as a list of rows, but headers is just one row. 您的代码无法正常工作,因为您将标题视为行列表,但是标题仅是一行。

Update 更新

This update deals with writing the CSV data to a new file. 此更新处理将CSV数据写入新文件。 You should move the open statement above the for row... 您应该将open语句移至for row...上方for row...

with open("CHS_2009_edit.txt", 'w') as output_file:
    writer = csv.writer(output_file)
    for row in reader:
        new_cols = list(row[i] for i in pref_cols)
        writer.writerows(new_cols)

Update 2 更新2

This update deals with the header output problem. 此更新处理标题输出问题。 If you followed my suggestions, you should not have this problem. 如果您遵循我的建议,则应该不会出现此问题。 I don't know what your current code looks like, but it looks like you supplies a string where the code expects a list. 我不知道您当前的代码是什么样子,但是看起来您在代码需要列表的地方提供了一个字符串。 Here is the code that I tried on my system (using my made-up data) and it seems to work: 这是我在系统上尝试过的代码(使用制成的数据),它似乎可以正常工作:

pref_cols = [...] # <<=== Should be set before entering the loop
with open('CHS_2009_test.txt', "rb") as sitefile:
    with open('CHS_2009_edit.txt', 'w') as output_file:
        reader = csv.reader(sitefile, delimiter=',')
        writer = csv.writer(output_file)
        for row in reader:
            new_row = list(row[i] for i in pref_cols)
            writer.writerow(new_row)

One thing to notice: I use writerow() to write a single row, where you use writerows() -- that makes a difference. 需要注意的一件事:我使用writerow()来写一行,在这里您使用writerows() -会writerows()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM