繁体   English   中英

如何在python中将标题行复制到新的csv

[英]How to copy header row to new csv in python

我似乎无法弄清楚如何将主行的标题行复制到匹配的行中……我需要在主csv中抓取第一行并将其写入匹配行中,如果符合条件,则写其余行。 ..

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    for line in master:
            if any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys):
                matched.write(line)

请帮忙。 我是python的新手,不知道如何使用熊猫或其他任何东西...

您可以只消耗文件的第一行来进行读取,然后将其写回到要写入的文件中:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    matched.write(next(master)) # can't use readline when iterating on the file afterwards

不过,看来其余部分确实需要csv模块。 我将编辑答案以尝试朝该方向尝试

使用csv模块,不需要那些不安全的split 逗号是默认的分隔符,引号也可以正确处理。 所以我只写:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title

    for row in cr:  # iterate on the rows, already organized as lists
        if any(city in row[5] for city in citys) and \
        any(state in row[6] for state in states) and \
        not any(category in row[2] for category in categorys):
            cw.writerow(row)

顺便说一句,您的过滤器会检查row[5]是否包含city ,但也许您想要完全匹配。 例如: "York"将匹配"New York" ,这可能不是您想要的。 所以我的建议是使用in检查每个条件的字符串是否在字符串列表中:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    for row in cr:
        if row[5] in citys and row[6] in states and not row[2] in categorys:
           cw.writerow(row)

使用生成器理解甚至可以一次写入所有行,这甚至可以更好:

import csv
with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    cr = csv.reader(master)
    cw = csv.writer(matched)
    cw.writerow(next(cr))  # copy title
    cw.writerows(row for row in cr if row[5] in citys and row[6] in states and not row[2] in categorys)

请注意, citysstatescategorys最好使用set而不是list因此查找算法要快得多(您未提供该信息)

如果您不想过分考虑行产生迭代器的工作原理,那么一种简单的方法是将第一行视为特殊:

with open('master.csv', 'r') as master, open('match.csv', 'w') as matched:
    first_line = True
    for line in master:
            if first_line or (any(city in line.split('","')[5] for city in citys) and \
            any(state in line.split('","')[6] for state in states) and \
            not any(category in line.split('","')[2] for category in categorys)):
                matched.write(line)
            first_line = False

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM