简体   繁体   English

将数据保存到 Pandas 中的多个 csv 文件

[英]Saving data to multiple csv files in pandas

I have this data from a .gov site:我有一个来自 .gov 网站的数据:

import pandas as pd
import io
import requests
url="https://download.bls.gov/pub/time.series/la/la.data.64.County"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

The number of rows is 4942096. I want to get all these into multiple csv files.行数为 4942096。我想将所有这些放入多个 csv 文件中。

I know how to get the first million as so:我知道如何获得第一个百万:

c.to_csv('nick.csv', index = False, chunksize = 1000000)

How do I get the rest?我如何获得其余的?

you can loop through the file and save it as so :您可以遍历文件并将其保存为:

filename = io.StringIO(s.decode('utf-8'))
# ^ not tested this but assuming it would work for readability sake. 

chunk_size = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    chunk.to_csv('nick.csv.gz',compression='gzip',index=False)

you'll need to add some sort of naming convention otherwise you will write over the file.您需要添加某种命名约定,否则您将覆盖文件。 I've also added in the gzip compression which significantly speeds up write times.我还添加了 gzip 压缩,这显着加快了写入时间。

i'd just add a counter personally我只是个人添加一个计数器

chunk_size = 10 ** 6
counter = 0
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    counter = counter + 1
    chunk.to_csv(f'nick_{str(counter)}.csv.gz',compression='gzip',index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM