根据两个条件为来自另一个数据帧的数据帧赋值

Question

I am trying to assign values from a column in df2['values'] to a column df1['values'].我正在尝试将 df2['values'] 中一列的值分配给 df1['values'] 列。 However values should only be assigned if:但是，只有在以下情况下才应分配值：

df2['category'] is equal to the df1['category'] (rows are part of the same category) df2['category'] 等于 df1['category'] （行是同一类别的一部分）
df1['date'] is in df2['date_range'] (date is in a certain range for a specific category) df1['date'] 在 df2['date_range'] 中（日期在特定类别的特定范围内）

So far I have this code, which works, but is far from efficient, since it takes me two days to process the two dfs (df1 has ca. 700k rows).到目前为止，我有这段代码，它有效，但效率远非有效，因为我需要两天时间来处理两个 dfs（df1 有大约 700k 行）。

for i in df1.category.unique():
for j in df2.category.unique():
    if i == j: # matching categories
        for ia, ra in df1.loc[df1['category'] == i].iterrows():
            for ib, rb in df2.loc[df2['category'] == j].iterrows():
                if df1['date'][ia] in df2['date_range'][ib]:
                    df1.loc[ia, 'values'] = rb['values']
                    break

I read that I should try to avoid using for-loops when working with dataframes.我读到我应该在处理数据帧时尽量避免使用 for 循环。 List comprehensions are great, however since I do not have a lot of experience yet, I struggle formulating more complicated code.列表推导式很棒，但是由于我还没有很多经验，所以我很难制定更复杂的代码。

How can I iterate over this problem more efficient?我怎样才能更有效地迭代这个问题？ What essential key aspect should I think about when iterating over dataframes with conditions?在有条件的数据帧上迭代时，我应该考虑哪些重要的关键方面？

The code above tends to skip some rows or assigns them wrongly, so I need to do a cleanup afterwards.上面的代码往往会跳过某些行或错误地分配它们，因此我需要在之后进行清理。 And the biggest problem, that it is really slow.最大的问题是它真的很慢。

Thank you.谢谢你。

Some df1 insight:一些 df1 见解：

df1.head()

    date                          category
0  2015-01-07                       f2
1  2015-01-26                       f2
2  2015-01-26                       f2
3  2015-04-08                       f2
4  2015-04-10                       f2

Some df2 insight:一些 df2 见解：

df2.date_range[0]

DatetimeIndex(['2011-11-02', '2011-11-03', '2011-11-04', '2011-11-05',
               '2011-11-06', '2011-11-07', '2011-11-08', '2011-11-09',
               '2011-11-10', '2011-11-11', '2011-11-12', '2011-11-13',
               '2011-11-14', '2011-11-15', '2011-11-16', '2011-11-17',
               '2011-11-18'],
              dtype='datetime64[ns]', freq='D')

df2 other two columns: df2 其他两列：

df2[['values','category']].head()

            values             category
0            01                  f1
1            02                  f1
2           2.1                  f1
3           2.2                  f1
4            03                  f1

Answer 1

Edit: Corrected erroneous code and added OP input from a comment编辑：更正了错误的代码并从注释中添加了 OP 输入

Alright so if you want to join the dataframes on similar categories, you can merge them :好吧，如果你想加入相似类别的数据框，你可以merge它们：

import pandas as pd

df3 = df1.merge(df2, on = "category")

Next, since date is a timestamp and the "date_range" is actually generated from two columns, per OP's comment, we rather use :接下来，由于date是一个时间戳，而“date_range”实际上是从两列生成的，根据 OP 的评论，我们宁愿使用：

mask = (df3["startdate"] <= df3["date"]) & (df3["date"] <= df3["enddate"])

subset = df3.loc[mask]

Now we get back to df1 and merge on the common dates while keeping all the values from df1 .现在我们回到df1并在公共日期合并，同时保留df1所有值。 This will create NaN for the subset values where they didn't match with df1 in the earlier merge.这将为在早期合并中与df1不匹配的子集值创建NaN 。

As such, we set df1["values"] where the entries in common are not NaN and we leave them be otherwise.因此，我们将df1["values"]设置为公共条目不是NaN ，否则我们将它们保留。

common_dates = df1.merge(subset, on = "date", how= "left") # keeping df1 values

df1["values"] = np.where(common_dates["values_y"].notna(), 
                         common_dates["values_y"], df1["values"])

NB : If more than one df1["date"] matches with the date range, you'll have to drop some values otherwise duplicates mess up the explanation.注意：如果超过一个df1["date"]与日期范围匹配，您将不得不删除一些值，否则重复会混淆解释。

Answer 2

You could accomplish the first point:你可以完成第一点：

1. df2['category'] is equal to the df1['category'] 1. df2['category'] 等于 df1['category']

with the use of a join.使用连接。

You could then use a for loop for filtering the data poings from df1[date] inside the merged dataframe that are not contemplated in the df2[date_range].然后，您可以使用 for 循环来过滤来自 df1[date] 合并数据帧内的数据，这些数据在 df2[date_range] 中没有考虑。 Unfortunately I need more information about the content of df1[date] and df2[date_range] to write the code here that would exactly do that.不幸的是，我需要更多关于 df1[date] 和 df2[date_range] 内容的信息来在这里编写代码来完全做到这一点。

根据两个条件为来自另一个数据帧的数据帧赋值

问题描述

2 个解决方案

解决方案1
0 2020-01-04 14:19:39

解决方案2
-1 2020-01-04 13:47:27

根据两个条件为来自另一个数据帧的数据帧赋值

问题描述

2 个解决方案

解决方案1 0 2020-01-04 14:19:39

解决方案2 -1 2020-01-04 13:47:27

解决方案1
0 2020-01-04 14:19:39

解决方案2
-1 2020-01-04 13:47:27