繁体   English   中英

使用python将csv文件拆分为具有重叠行的多个文件

[英]Splitting a csv files into multiple file with overlapping rows using python

我目前正在尝试将我的 csv 文件拆分为多个文件,每个拆分的开头相互重叠(例如:文件 1 将是第 1-4000 行,然后第 2 行将是 3000-7000,第 3 行将是6000-10000等)

chunk_size = 4000
def write_chunk(part, lines):
    with open('data_part_'+ str(part) +'.csv', 'w') as f_out:
        f_out.write(header)
        f_out.writelines(lines)
        
with open("8-0new2.csv", "r") as f:
    count = 0
    header = f.readline()
    lines = []
    # for line in f:
    for line in range():

        count += 1
        lines.append(line)
        if count % chunk_size == 0:
            write_chunk(count // chunk_size, lines)
            lines = []
    # write remainder
    if len(lines) > 0:
        write_chunk((count // chunk_size) + 1, lines)

这是我当前将 csv 拆分为 4 个文件的代码,有什么想法可以改进它,以便它可以编写具有重叠行的 csv 吗?

我没有数据可以对此进行彻底测试,但应该可以:

CHUNK = 4_000
OVERLAP = 1_000

def write_csv(lines, filename, header):
    with open(filename, 'w') as csv:
        csv.write(header)
        csv.writelines(lines)

def get_csv_gen():
    part = 1
    while True:
        yield f'data_part_{part}.csv'
        part += 1

get_csv_name = get_csv_gen()

with open('8-0new2.csv') as csv:
    header = csv.readline()
    lines = csv.readlines()
    for offset in range(0, len(lines), CHUNK-OVERLAP):
        write_csv(lines[offset:offset+CHUNK], next(get_csv_name), header)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM