简体   繁体   中英

Splitting a csv files into multiple file with overlapping rows using python

I'm currently trying to split my csv files into multiple files, with the beginning of each split overlapping each other (for example: file 1 will be row 1-4000, then row 2 will be 3000-7000, and row 3 will be 6000-10000, etc.)

chunk_size = 4000
def write_chunk(part, lines):
    with open('data_part_'+ str(part) +'.csv', 'w') as f_out:
        f_out.write(header)
        f_out.writelines(lines)
        
with open("8-0new2.csv", "r") as f:
    count = 0
    header = f.readline()
    lines = []
    # for line in f:
    for line in range():

        count += 1
        lines.append(line)
        if count % chunk_size == 0:
            write_chunk(count // chunk_size, lines)
            lines = []
    # write remainder
    if len(lines) > 0:
        write_chunk((count // chunk_size) + 1, lines)

this is my current code to split the csv into 4 files, any idea to improve it so it can write the csv with overlapping rows?

I don't have the data to test this thoroughly but should work:

CHUNK = 4_000
OVERLAP = 1_000

def write_csv(lines, filename, header):
    with open(filename, 'w') as csv:
        csv.write(header)
        csv.writelines(lines)

def get_csv_gen():
    part = 1
    while True:
        yield f'data_part_{part}.csv'
        part += 1

get_csv_name = get_csv_gen()

with open('8-0new2.csv') as csv:
    header = csv.readline()
    lines = csv.readlines()
    for offset in range(0, len(lines), CHUNK-OVERLAP):
        write_csv(lines[offset:offset+CHUNK], next(get_csv_name), header)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM