Make processing faster in Pandas

Question

I'm doing some comparing one dataframe with 3 other on one column in vertical processing, and I would like to know if is possible to this process use more cores / make it faster? I tried concurrent.futures.ProcessPoolExecutor() but it was actual 1 second slower... this is my code

       # df_out is main DataFrame, hikari_data_df, kokyaku_data_df, hikanshou_data_df are DF to compare 
        m1 = df_out[self.col_name_].isin(hikari_data_df['phone_num1'])
        m2 = df_out[self.col_name_].isin(hikari_data_df['phone_num2'])
        # Add new column to df_out on place of matching m1 with df_out col
        df_out['new1'] = df_out[self.col_name_].where(m1)
        df_out['new2'] = df_out[self.col_name_].where(m2)

        m1 = df_out[self.col_name_].isin(kokyaku_data_df['phone_number1'])
        m2 = df_out[self.col_name_].isin(kokyaku_data_df['phone_number2'])
        df_out['new3'] = df_out[self.col_name_].where(m1)
        df_out['new4'] = df_out[self.col_name_].where(m2)

        m1 = df_out[self.col_name_].isin(hikanshou_data_df['phone_number'])
        df_out['new5'] = df_out[self.col_name_].where(m1)


        df_out.to_csv(sys.argv[1], index=False)

I would like to have this process faster!

Answer 1

First, if your data is not big. Try to transform your 'isin'/'where' function into vector operation like 'join/merge'. This will cost more memory but much faster.

Second, Use dask.But, be carefull. If your data is not huge enough. Dask will slower.

Make processing faster in Pandas

Question

1 answers

solution1
0 2020-06-10 06:53:25

Make processing faster in Pandas

Question

1 answers

solution1 0 2020-06-10 06:53:25

solution1
0 2020-06-10 06:53:25