简体   繁体   English

如何遍历两个数据框列并基于这两列中的值从列表中添加值

[英]How to iterate over two dataframe columns and add values from a list based on the values in those two columns

I have a dataframe with three columns like this: 我有一个包含三列的数据框,如下所示:

Subject{1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, ...} datetime{6/4/16 3:04:30, 6/5/16 6:02:15, ...} markers{} 主题{1、1、1、2、2、3、3、3、3、4、4 ...}日期时间{6/4/16 3:04:30,6/5/16 6:02: 15,...}标记{}

It is sorted by subject then by datetime, and the markers column is empty 它按主题然后按日期时间排序,并且标记列为空

I also have a dictionary which maps subject numbers to lists of datetimes. 我还有一本字典,可将主题编号映射到日期时间列表。 These datetimes are not exactly the same as the ones already in the dataframe. 这些日期时间与数据框中已有的日期时间不完全相同。 I want to add all these datetimes to the markers column in their corresponding subject and date row for comparison purposes, so a dictionary with the key (subject) 1 with a list of values like {6/4/16 5:00:15, 6/5/16 6:10:30} would have its first value added to row 1 because the subject and date match and its second value added to row 2 for the same reason. 我想将所有这些日期时间添加到其相应的主题和日期行中的标记列中,以进行比较,因此,字典中的键(主题)为1,其值列表为{6/4/16 5:00:15, 6/5/16 6:10:30}将第一个值添加到第1行,因为出于相同的原因,主题和日期匹配,第二个值添加到第2行。

I thought of looping through each dictionary key and all it's corresponding datetimes, but then finding the corresponding row in the map for each datetime within the nested loops would be very inefficient. 我想到要遍历每个字典键及其所有对应的日期时间,但是然后在嵌套循环中为每个日期时间在映射中找到对应的行会非常低效。 It would be something like this: 就像这样:

for subject in df.iloc[:, 0]:
    # go to subject in dictionary and loop through datetimes in corresponding list,
    # adding the matching datetime to the current row
    # O(n^2) time!

Is there a more efficient way to do this? 有没有更有效的方法可以做到这一点?

Thanks! 谢谢!

try this, you will have to customize the answer somewhat to meet your specific needs, but the logic is basically the same. 尝试此操作,您将必须对答案进行一些自定义以满足您的特定需求,但是逻辑基本相同。

df = pd.DataFrame({'colA': [100,200],'colB': ['NaN','NaN']})
dict1 = {100: ['rat','cat','hat'], 200: ['hen','men','den']}
df = pd.concat([df['colA'],df['colA'].map(dict1).apply(pd.Series)], axis = 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM