简体   繁体   中英

Reading a big csv file into dataframe

I have a large csv file (of 13 GB) that I wish to read into a dataframe in Python. So I use:

txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", iterator = True, chunksize=1000)

It works just fine, but the data is contained in a pandas.io.parsers.TextFileReader type, and I want to have it into a dataframe, in order to manipulate the data.

I manage to get a sample of the data, as a dataframe using:

txt.get_chunk(300)

But I would like to have all of the data inside a dataframe. So, I tried:

for df1 in txt:
df.append(df1)

I also tried:

df2 = pd.concat([chunk for chunk in txt])

Didn't work either. Can someone please help me?

Thanks in advance!

You can have a part of data in to a variable using the 'nrows' parameter while reading the file.

txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", nrows=1000)

However, in such cases you have to prefer using the bigger instance to deal with huge data. You can also use multiple instances by setting up dask.

尝试看看这个答案,特别是dask read_csv可以做到这一点。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM