[英]Python Pandas - Get Location from 2nd dataframe using 1st data
Very basic user of Pandas but I am coming against a brick wall here. 熊猫的基本用户,但我在这里遇到了砖墙。
So I have one dataframe called dg has a column called 'user_id', and two other columns which aren't needed at the moment. 因此,我有一个名为dg的数据帧,其中有一个名为“ user_id”的列,而目前不需要其他两列。 I also have two more dataframes(data_conv and data_retargeting) with includes the same column name and a column called 'timestamp' in it however there is multiple timestamps for each 'user_id'.
我还有另外两个数据框(data_conv和data_retargeting),其中包含相同的列名和一个名为“ timestamp”的列,但是每个“ user_id”都有多个时间戳。
What I need to create new columns in dg for the minimum and maximum 'timestamp' found. 我需要在dg中为找到的最小和最大“时间戳”创建新列。
I am currently able to do this through some very long-winded method with iterrow rows however for a dataframe of ~16000, it took 45minutes and I would like to cut it down because I have larger dataframes to run this one. 我目前可以通过一些行数较长的方法来完成此操作,但是对于大约16000的数据帧,它花费了45分钟,我想将其缩减,因为我有更大的数据帧来运行此数据帧。
for index,row in dg.iterrows(): user_id=row['pdp_id'] n_audft=data_retargeting[data_retargeting.pdp_id == user_id].index.min() n_audlt=data_retargeting[data_retargeting.pdp_id == user_id].index.max() n_convft=data_conv[data_conv.pdp_id == user_id].index.min() n_convlt=data_conv[data_conv.pdp_id == user_id].index.max() dg[index,'first_retargeting']=data_retargeting.loc[n_audft, 'raw_time'] dg[index,'last_retargeting']=data_retargeting.loc[n_audlt, 'raw_time'] dg[index,'first_conversion']=data_conv.loc[n_convft, 'raw_time'] dg[index,'last_conversion']=data_conv.loc[n_convlt, 'raw_time']
without going into specific code, is every user_id in dg found in data_conv and data_retargeting? 无需输入特定代码,是否可以在data_conv和data_retargeting中找到dg中的每个user_id? if so, you can merge ( http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.merge.html ) them into a new dataframe first, and then compute the max/min, and extract the desired columns.
如果是这样,您可以先将它们合并( http://pandas.pydata.org/pandas-docs/dev/genic/pandas.DataFrame.merge.html )合并成新的数据框,然后计算最大值/最小值,然后提取所需的列。 i suspect that might run a little bit faster.
我怀疑这可能会运行得更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.