[英]Splitting a csv files into multiple file with overlapping rows using python
我目前正在尝试将我的 csv 文件拆分为多个文件,每个拆分的开头相互重叠(例如:文件 1 将是第 1-4000 行,然后第 2 行将是 3000-7000,第 3 行将是6000-10000等)
chunk_size = 4000
def write_chunk(part, lines):
with open('data_part_'+ str(part) +'.csv', 'w') as f_out:
f_out.write(header)
f_out.writelines(lines)
with open("8-0new2.csv", "r") as f:
count = 0
header = f.readline()
lines = []
# for line in f:
for line in range():
count += 1
lines.append(line)
if count % chunk_size == 0:
write_chunk(count // chunk_size, lines)
lines = []
# write remainder
if len(lines) > 0:
write_chunk((count // chunk_size) + 1, lines)
这是我当前将 csv 拆分为 4 个文件的代码,有什么想法可以改进它,以便它可以编写具有重叠行的 csv 吗?
我没有数据可以对此进行彻底测试,但应该可以:
CHUNK = 4_000
OVERLAP = 1_000
def write_csv(lines, filename, header):
with open(filename, 'w') as csv:
csv.write(header)
csv.writelines(lines)
def get_csv_gen():
part = 1
while True:
yield f'data_part_{part}.csv'
part += 1
get_csv_name = get_csv_gen()
with open('8-0new2.csv') as csv:
header = csv.readline()
lines = csv.readlines()
for offset in range(0, len(lines), CHUNK-OVERLAP):
write_csv(lines[offset:offset+CHUNK], next(get_csv_name), header)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.