简体   繁体   English

Python Pandas-使用第一数据从第二数据框中获取位置

[英]Python Pandas - Get Location from 2nd dataframe using 1st data

Very basic user of Pandas but I am coming against a brick wall here. 熊猫的基本用户,但我在这里遇到了砖墙。

So I have one dataframe called dg has a column called 'user_id', and two other columns which aren't needed at the moment. 因此,我有一个名为dg的数据帧,其中有一个名为“ user_id”的列,而目前不需要其他两列。 I also have two more dataframes(data_conv and data_retargeting) with includes the same column name and a column called 'timestamp' in it however there is multiple timestamps for each 'user_id'. 我还有另外两个数据框(data_conv和data_retargeting),其中包含相同的列名和一个名为“ timestamp”的列,但是每个“ user_id”都有多个时间戳。

What I need to create new columns in dg for the minimum and maximum 'timestamp' found. 我需要在dg中为找到的最小和最大“时间戳”创建新列。

I am currently able to do this through some very long-winded method with iterrow rows however for a dataframe of ~16000, it took 45minutes and I would like to cut it down because I have larger dataframes to run this one. 我目前可以通过一些行数较长的方法来完成此操作,但是对于大约16000的数据帧,它花费了45分钟,我想将其缩减,因为我有更大的数据帧来运行此数据帧。

  for index,row in dg.iterrows(): user_id=row['pdp_id'] n_audft=data_retargeting[data_retargeting.pdp_id == user_id].index.min() n_audlt=data_retargeting[data_retargeting.pdp_id == user_id].index.max() n_convft=data_conv[data_conv.pdp_id == user_id].index.min() n_convlt=data_conv[data_conv.pdp_id == user_id].index.max() dg[index,'first_retargeting']=data_retargeting.loc[n_audft, 'raw_time'] dg[index,'last_retargeting']=data_retargeting.loc[n_audlt, 'raw_time'] dg[index,'first_conversion']=data_conv.loc[n_convft, 'raw_time'] dg[index,'last_conversion']=data_conv.loc[n_convlt, 'raw_time'] 

without going into specific code, is every user_id in dg found in data_conv and data_retargeting? 无需输入特定代码,是否可以在data_conv和data_retargeting中找到dg中的每个user_id? if so, you can merge ( http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.merge.html ) them into a new dataframe first, and then compute the max/min, and extract the desired columns. 如果是这样,您可以先将它们合并( http://pandas.pydata.org/pandas-docs/dev/genic/pandas.DataFrame.merge.html )合并成新的数据框,然后计算最大值/最小值,然后提取所需的列。 i suspect that might run a little bit faster. 我怀疑这可能会运行得更快。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Dataframe: To get a column value from 2nd dataframe based on a column in the 1st dataframe is in between two columns in the 2nd dataframe - Python Dataframe: To get a column value from 2nd dataframe based on a column in the 1st dataframe is in between two columns in the 2nd dataframe 在python中获取第一个和第二个管道之间的数据 - Get data between 1st and 2nd pipe in python Python:将变量从第一个脚本传递到第二个脚本,并将不同的变量从第二个脚本传递到第一个脚本 - Python: Passing variable from 1st script to 2nd script and passing different variable from 2nd script to 1st script 使用 python 使用 csv 文件的第二行信息更新第一行标题 - Update 1st row headers with info from the 2nd row for csv file using python 如何强制 pandas dataframe 的第 2 级加起来达到第 1 级? - How to enforce 2nd level of pandas dataframe to add up to 1st level? 从单个 Pandas 列中取出第一和第二、第四和第五等行并放入两个新列 Python - Taking the 1st and 2nd, 4th and 5th etc rows from a single Pandas column and put in two new columns, Python pandas 将第一个多索引转换为行索引,将第二个多索引转换为列索引 - pandas transform 1st mutliindex to rowindex and 2nd multiindex to columnindex 将 pandas 数据框列中的每个值与第二个数据框列的所有值相乘并将每个第一个数据框值替换为结果数组 - Multiply each value in a pandas dataframe column with all values of 2nd dataframe column & replace each 1st dataframe value with resulting array 使用pandas连接2个文本文件,第1个文本文件插入标题,第2个文本文件作为正文 - Joining 2 text files using pandas, 1st text file into header, the 2nd as the body Pandas MultiIndex:对每个第一个索引使用相同的第二个索引 - Pandas MultiIndex: Using same 2nd index for each 1st index
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM