简体   繁体   English

如何将for循环转换为lambda function

[英]how to transform for loop to lambda function

I have written this function:我写了这个 function:

def time_to_unix(df,dateToday):
    '''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
        and then it adds the seconds that were originally in the timestamp column.

        input: dataframe, dateToday(type: pandas.core.series.Series)
        output: list of times
    '''

    dateTime = dateToday[0]
    times = []

    for i in range(0,len(df['timestamp'])):
        dateAndTime = dateTime + timedelta(seconds = float(df['timestamp'][i]))
        unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
        times.append(unix[0])
    return times    

so it takes a dataframe and it gets today's date and then its taking the value of the timestamp in the dataframe( which is in seconds like 10,20,.... ) then it applies the function and returns times in unix time所以它需要一个 dataframe 并获取今天的日期,然后获取数据帧中时间戳的值(以秒为单位,如 10,20,....)然后它应用 function 并在 Z4913A9179162 中返回时间

however, because I have approx 2million row in my dataframe, its taking me a lot of time to run this code.但是,因为我的 dataframe 中有大约 200 万行,所以运行这段代码要花很多时间。

how can I use lambda function or something else in order to speed up my code and the process.我如何使用 lambda function 或其他东西来加快我的代码和进程。

something along the line of:类似的东西:

df['unix'] = df.apply(lambda row : something in here), axis = 1)

What I think you'll find is that most of the time is spent in the creation and manipulation of the datetime / timestamp objects in the dataframe (see here for more info).我想你会发现大部分时间都花在了 dataframe 中的日期时间/时间戳对象的创建和操作上(有关更多信息,请参见此处)。 I also try to avoid using lambdas like this on large dataframes as they go row by row which should be avoided.我也尽量避免在大型数据帧上使用这样的 lambda,因为它们是 go 逐行应该避免的。 What I've done when dealing with datetimes / timestamps / timezone changes in the past is to build a dictionary of the possible datetime combinations and then use map to apply them.我过去在处理日期时间/时间戳/时区更改时所做的是构建可能的日期时间组合的字典,然后使用 map 应用它们。 Something like this:像这样的东西:

import datetime as dt
import pandas as pd


#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) + '@' + df['timestamp']

#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
    time_split = time_key.split('@')
    #Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
    time_dict[time_key] = (pd.to_datetime(time_split[0]) + dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9

#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)

Note if all the datetime combinations are unique in the dataframe, this likely won't help.请注意,如果 dataframe 中的所有日期时间组合都是唯一的,这可能无济于事。

I'm not exactly sure what type dateTime[0] has.我不确定dateTime[0]有什么类型。 But you could try a more vectorized approach:但是您可以尝试一种更加矢量化的方法:

import pandas as pd

df["unix"] = (
    (pd.Timestamp(dateTime[0]) + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

or或者

df["unix"] = (
    (dateTime[0] + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM