How can I concat the huge pandas dataframes into one?

Question

I have 12 csv files with total size 8.45 GB. I would like to read all csv files into pasdas dataframe with read_csv.

I tried using this code

# Example of 3 files

list = ['file-01.csv',
       'file-02.csv',
       'file-03.csv']


li = []

for filename in list:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

concat_df = pd.concat(li, axis=0, ignore_index=True)

Then it showed

MemoryError: Unable to allocate 784. MiB for an array with shape (1, 102804250) and data type int64

How can I solve this issue?

Thanks,

Answer 1

This may not be possible, however, you will get a much more memory efficient process if you use a generator rather than appending to a list and concatenating. Creating a list and concatenating will require about twice as much memory as a generator.

Try this:

#please don't use `list` as a variable name.
file_list = [
    'file-01.csv',
    'file-02.csv',
    'file-03.csv'
]

def yield_dfs(file_list):
    """generator function to yield dataframes."""
    for file_name in file_list:
        df = pd.read_csv(file_name)
        # you may be able to reduce the memory requirements by doing some pre-processing of the dataframe here. e.g. convert strings to booleans so save memore.
        yield df

df = pd.concat(yield_dfs(file_list))

I didn't run that code to check for syntax errors and the specifics may vary a little depending on the paths.

If you have enough system memory for the grand DataFrame, that is pretty likely to work. However, you are talking about a very big dataframe and it depends a lot on the datatypes your are working with.

Answer 2

files = ['file-01.csv',
         'file-02.csv',
         'file-03.csv']

df = pd.read_csv('file-01.csv', index_col=0)

for file in files[1:]:
    df_i = pd.read_csv(file, index_col=0)
    df = pd.concat((df, df_i), axis=0)
    df.reset_index(drop=True, inplace=True)

How can I concat the huge pandas dataframes into one?

Question

2 answers

solution1
0 2022-01-30 02:13:22

solution2
-1 2022-01-30 02:17:54

How can I concat the huge pandas dataframes into one?

Question

2 answers

solution1 0 2022-01-30 02:13:22

solution2 -1 2022-01-30 02:17:54

solution1
0 2022-01-30 02:13:22

solution2
-1 2022-01-30 02:17:54