Reading a big csv file into dataframe

Question

I have a large csv file (of 13 GB) that I wish to read into a dataframe in Python. So I use:

txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", iterator = True, chunksize=1000)

It works just fine, but the data is contained in a pandas.io.parsers.TextFileReader type, and I want to have it into a dataframe, in order to manipulate the data.

I manage to get a sample of the data, as a dataframe using:

txt.get_chunk(300)

But I would like to have all of the data inside a dataframe. So, I tried:

for df1 in txt:
df.append(df1)

I also tried:

df2 = pd.concat([chunk for chunk in txt])

Didn't work either. Can someone please help me?

Thanks in advance!

Answer 1

You can have a part of data in to a variable using the 'nrows' parameter while reading the file.

txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", nrows=1000)

However, in such cases you have to prefer using the bigger instance to deal with huge data. You can also use multiple instances by setting up dask.

Answer 2

尝试看看这个答案，特别是dask read_csv可以做到这一点。

Reading a big csv file into dataframe

Question

2 answers

solution1
0 2020-01-15 17:23:35

solution2
0 2020-01-15 18:23:20

Reading a big csv file into dataframe

Question

2 answers

solution1 0 2020-01-15 17:23:35

solution2 0 2020-01-15 18:23:20

solution1
0 2020-01-15 17:23:35

solution2
0 2020-01-15 18:23:20