[英]How to Extract Specific Values from Iterating Through Two Dataframes and Dynamically Append?
I am trying to perform a multi-step way of using values in an existing dataframe to create subset dataframes (or series) and compare those values against the existing dataframe to ultimately create one new column in the original dataframe.我正在尝试使用现有 dataframe 中的值来创建子集数据帧(或系列)并将这些值与现有的 dataframe 进行比较,以最终在原始 Z6A8064ZB55DF47945505700 中创建一个新列
The step by step process is:逐步的过程是:
Some attempts I've taken but am receiving errors are: Comparing date values between dfs:我已经采取但收到错误的一些尝试是:比较 dfs 之间的日期值:
df1.loc[df1['actual_date']==df2[df2['min_date']]
#Produces unexpected EOF while parsing
df['actual_date']==df2['min_date']
#Produces ValueError: Can only compare identically-labeled Series objects
Iterating through conditions:遍历条件:
for each in range(len(df1):
if df1[df1['actual_date']]==df2[df2['min_date']]:
df1['exctract_value_new']=df2['extract_value']
#Produces: KeyError: "None of [DatetimeIndex....are in the [columns]"
I've tried searching both value and key errors and am having trouble understanding the threads regarding indexing.我尝试搜索值和键错误,但无法理解有关索引的线程。 Specifically I am unsure how df1 and/or df2 need to be reformatted to compare date values in this manner and then extract a separate column based on when criteria matches in both dataframes.
具体来说,我不确定如何重新格式化 df1 和/或 df2 以这种方式比较日期值,然后根据两个数据帧中的条件何时匹配来提取单独的列。
Here is the sample data working with:以下是使用的示例数据:
df1 (base) df1(基础)
actual_date![]() |
extract_value![]() |
---|---|
2021-01-22 ![]() |
22 ![]() |
2021-01-23 ![]() |
24 ![]() |
2021-01-24 ![]() |
15 ![]() |
2021-02-22 ![]() |
16 ![]() |
2021-02-05 ![]() |
34 ![]() |
2021-02-04 ![]() |
18 ![]() |
df2 df2
month![]() |
min_date ![]() |
---|---|
2021-01-01 ![]() |
2021-01-22 ![]() |
2021-02-01 ![]() |
2021-02-04 ![]() |
2021-03-01 ![]() |
2021-03-01 ![]() |
End Goal for df1 df1 的最终目标
actual_date![]() |
min_date ![]() |
extract_value_new ![]() |
extract_value_original ![]() |
---|---|---|---|
2021-01-22 ![]() |
2021-01-22 ![]() |
22 ![]() |
22 ![]() |
2021-01-23 ![]() |
2021-01-22 ![]() |
22 ![]() |
24 ![]() |
2021-02-04 ![]() |
2021-02-04 ![]() |
18 ![]() |
18 ![]() |
2021-02-05 ![]() |
2021-02-04 ![]() |
18 ![]() |
34 ![]() |
Appreciate any help!感谢任何帮助!
You might want to check out pd.merge_asof
...solution here gives me the desired table!您可能想查看
pd.merge_asof
...solution 这里给了我想要的表!
merge_asof
essentially lets us join on the "closest" values in two columns. merge_asof
本质上让我们加入两列中“最接近”的值。
import pandas as pd
import requests
from datetime import datetime, timedelta
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')
datetime_format = '%Y-%m-%d' # format of date (needs to be a datetime to merge in pandas)
tolerance = timedelta(days=1) # the tolerance to join dates on
# convert our dates
df1['actual_date'] = pd.to_datetime(df1['actual_date'], format=datetime_format)
df2['min_date'] = pd.to_datetime(df2['min_date'], format=datetime_format)
# sort...
df1.sort_values(by='actual_date', inplace=True)
out = pd.merge_asof(df1, df2, left_on='actual_date', right_on='min_date', tolerance=tolerance).dropna()
Gives me:给我:
actual_date extract_value month min_date
0 2021-01-22 22 2021-01-01 2021-01-22
1 2021-01-23 24 2021-01-01 2021-01-22
3 2021-02-04 18 2021-02-01 2021-02-04
4 2021-02-05 34 2021-02-01 2021-02-04
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.