简体   繁体   English

将自定义函数应用于pandas数据框中的每一行的更快方法?

[英]Faster way to apply custom function to each row in pandas dataframe?

I have two dataframes ad_df, x_df . 我有两个数据框ad_df,x_df I created a function find_ids that takes in an ID ad_id and a date ad_date from ad_df . 我创建了一个函数find_ids取入的ID ad_id和从ad_df日期ad_date。

The function filters x_df by the following 该函数通过以下内容过滤x_df

  • x_df['ID] = ad_id x_df ['ID] = ad_id
  • x_df['Last_Date'] is between 2 days before ad_date and 15 days after ad_date x_df ['Last_Date']在ad_date之前2天到ad_date之后15天之间
  • In at least one of the rows from the beginning range of dates and the end range of dates, x_df['Geo_Label'] contains a '1' 在日期的开始范围和日期的结束范围的至少一行中,x_df ['Geo_Label']包含“ 1”

Then I append the resulting dataframe to a global dataframe res_df that keeps track of these rows. 然后,我将结果数据附加到跟踪这些行的全局数据框res_df中。

I call the function by using the line below: 我通过使用以下行来调用该函数:

ad_df.apply(lambda x: find_units_moved(x['SerialNo'],x['Audit Date'] ), axis = 1)

Is there a faster way to do this? 有更快的方法吗? ad_df has about 1M rows, so hopefully there is a faster way to do this. ad_df大约有100万行,因此希望有一种更快的方法。 The code for the function is shown below. 该功能的代码如下所示。

def find_ad_ids(ad_id, ad_date):
    id_specific_df = x_df.loc[x_df['ID'] == ad_id]

    beg_range_date = ad_date - timedelta(days = 2)
    end_range_date = ad_date + timedelta(days = 15)

    beg_df = id_specific_df[(id_specific_df['Last_Date'] > beg_range_date) & (id_specific_df['Last_Date'] < ad_date)]
    end_df = id_specific_df[(id_specific_df['Last_Date''] > ad_date) & (id_specific_df['Last_Date'] < end_range_date)]


    if(len(beg_df.columns) != 0 and len(end_df.columns) != 0):
        if(('1' in beg_df['Geo_Label'].array) and ('1' in end_df['Geo_Label'].array)):
            res_df.append(pd.concat([beg_df, end_df], ignore_index=True))

One of the fastest ways to append data to a Dataframe is through dict: 将数据追加到数据框的最快方法之一是通过dict:

startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 将自定义 function 应用到每个 dataframe 结果和 Z9516DFB15F51C7EE19A4D46DZC0 - Pandas apply custom function to each dataframe row and append results 按组将函数应用于 Pandas 数据框中的每一行 - Apply function to each row in Pandas dataframe by group 有没有一种更快的方法来在熊猫DataFrame的每一行上训练IsolationForest? - is there a faster way for Training IsolationForest on each row of pandas DataFrame? 在 DataFrame 中每隔一行执行 function 的更快方法? - Faster way to perform a function on each row with every other row in a DataFrame? 用于将函数应用于 Pandas DataFrame 中的每一行的应用函数的替代方法 - Alternative to apply function for applying a function to each row in Pandas DataFrame 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns 在没有 for 循环的情况下,将包含 if 的函数应用于 pandas 中数据帧的每一行 - Apply a function including if to each row of a dataframe in pandas without for loop 如何将 function 应用于 pandas dataframe 中一列的每一行? - How to apply a function to each row of one column in a pandas dataframe? 将函数应用于pandas数据框列中每一行的每个单词 - apply function to each word of every row in pandas dataframe column 如何将 function 应用于 pandas dataframe 中的每一行? - How can I apply a function to each row in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM