如何在python中将标题行复制到新的csv

Question

I can't seem to figure out how to copy my header row from master to matched... I need to grab the first row in my master csv and write it first in matched, then write the remaining lines if they match the criteria... 我似乎无法弄清楚如何将主行的标题行复制到匹配的行中……我需要在主csv中抓取第一行并将其写入匹配行中，如果符合条件，则写其余行。 ..

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    for line in master:
            if any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys):
                matched.write(line)

Please help. 请帮忙。 I am new to python and don't know how to use pandas or anything else... 我是python的新手，不知道如何使用熊猫或其他任何东西...

Answer 1

you can just consume the first line of the file to read and write it back in the file to be written: 您可以只消耗文件的第一行来进行读取，然后将其写回到要写入的文件中：

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    matched.write(next(master)) # can't use readline when iterating on the file afterwards

Seems that you really need csv module, though, for the rest. 不过，看来其余部分确实需要csv模块。 I'll edit my answer to attempt something in that direction 我将编辑答案以尝试朝该方向尝试

With the csv module, no need for those unsafe split . 使用csv模块，不需要那些不安全的split 。 Comma is the default separator and quotes are also handled properly. 逗号是默认的分隔符，引号也可以正确处理。 So I'd just write: 所以我只写：

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title

    for row in cr:  # iterate on the rows, already organized as lists
        if any(city in row[5] for city in citys) and \
        any(state in row[6] for state in states) and \
        not any(category in row[2] for category in categorys):
            cw.writerow(row)

BTW your filter checks that city is contained in row[5] , but maybe you'd like an exact match. 顺便说一句，您的过滤器会检查row[5]是否包含city ，但也许您想要完全匹配。 Ex: "York" would match "New York" , which is probably not what you want. 例如： "York"将匹配"New York" ，这可能不是您想要的。 So my proposal would be using in to check if the string is in the list of strings, for each criterion: 所以我的建议是使用in检查每个条件的字符串是否在字符串列表中：

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    for row in cr:
        if row[5] in citys and row[6] in states and not row[2] in categorys:
           cw.writerow(row)

which can be even bettered using generator comprehension and write all lines at once: 使用生成器理解甚至可以一次写入所有行，这甚至可以更好：

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys)

note that citys , states , and categorys would be better as set s rather than list s so lookup algorithm is much faster (you didn't provide that information) 请注意， citys ， states和categorys最好使用set而不是list因此查找算法要快得多（您未提供该信息）

Answer 2

If you don't want to think too hard about how the line-producing iterator works, oOne straightforward way to do it is to treat the first line special: 如果您不想过分考虑行产生迭代器的工作原理，那么一种简单的方法是将第一行视为特殊：

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    first_line = True
    for line in master:
            if first_line or (any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys)):
                matched.write(line)
            first_line = False

如何在python中将标题行复制到新的csv

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-12-28 21:53:31

解决方案2
0 2016-12-28 21:54:18

如何在python中将标题行复制到新的csv

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-12-28 21:53:31

解决方案2 0 2016-12-28 21:54:18

解决方案1
2 已采纳 2016-12-28 21:53:31

解决方案2
0 2016-12-28 21:54:18