I have two dataframes ad_df, x_df . I created a function find_ids that takes in an ID ad_id and a date ad_date from ad_df .
The function filters x_df by the following
Then I append the resulting dataframe to a global dataframe res_df that keeps track of these rows.
I call the function by using the line below:
ad_df.apply(lambda x: find_units_moved(x['SerialNo'],x['Audit Date'] ), axis = 1)
Is there a faster way to do this? ad_df has about 1M rows, so hopefully there is a faster way to do this. The code for the function is shown below.
def find_ad_ids(ad_id, ad_date):
id_specific_df = x_df.loc[x_df['ID'] == ad_id]
beg_range_date = ad_date - timedelta(days = 2)
end_range_date = ad_date + timedelta(days = 15)
beg_df = id_specific_df[(id_specific_df['Last_Date'] > beg_range_date) & (id_specific_df['Last_Date'] < ad_date)]
end_df = id_specific_df[(id_specific_df['Last_Date''] > ad_date) & (id_specific_df['Last_Date'] < end_range_date)]
if(len(beg_df.columns) != 0 and len(end_df.columns) != 0):
if(('1' in beg_df['Geo_Label'].array) and ('1' in end_df['Geo_Label'].array)):
res_df.append(pd.concat([beg_df, end_df], ignore_index=True))
One of the fastest ways to append data to a Dataframe is through dict:
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
row_list.append(dict1)
df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.