I have a large csv file (of 13 GB) that I wish to read into a dataframe in Python. So I use:
txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", iterator = True, chunksize=1000)
It works just fine, but the data is contained in a pandas.io.parsers.TextFileReader type, and I want to have it into a dataframe, in order to manipulate the data.
I manage to get a sample of the data, as a dataframe using:
txt.get_chunk(300)
But I would like to have all of the data inside a dataframe. So, I tried:
for df1 in txt:
df.append(df1)
I also tried:
df2 = pd.concat([chunk for chunk in txt])
Didn't work either. Can someone please help me?
Thanks in advance!
You can have a part of data in to a variable using the 'nrows' parameter while reading the file.
txt = pd.read_csv(r'...file.csv', sep=';', encoding="UTF-8", nrows=1000)
However, in such cases you have to prefer using the bigger instance to deal with huge data. You can also use multiple instances by setting up dask.
尝试看看这个答案,特别是dask read_csv可以做到这一点。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.