简体   繁体   English

通过将不同数据框的2列中的值视为范围来为数据框分配值

[英]Assign values to a dataframe by considering values in 2 columns of different dataframe as range

The following code explains the scenario, I have a dataframe(df_ticker) with 3 columns 以下代码说明了这种情况,我有一个包含3列的数据框

import pandas as pd 
df_ticker = pd.DataFrame({'Min_val': [22382.729,36919.205,46735.164,62247.61], 'Max_val': [36901.758,46716.06,62045.06,182727.05],
           'Ticker':['$','$$','$$$','$$$$']})
df_ticker`

df_ticker My second dataframe contains 2 columns df_ticker我的第二个数据包含2列

df_values = pd.DataFrame({'Id':[1,2,3,4,5,6],'sal_val': [3098,45639.987,65487.4,56784.8,8,736455]})
df_values  `

df_values df_values

For every value in df_values ['sal_val'], I want to check in which range it falls in df_ticker [Max_val] and df_ticker [min_val] and assign df_ticker [ticker] accordingly. 对于df_values ['sal_val']中的每个值,我想检查它在df_ticker [Max_val]和df_ticker [min_val]的哪个范围内,并相应地分配df_ticker [ticker]。
Sample output would be something like this, sample_output 示例输出将是这样, sample_output
In the sample output, sal_val=3098 is greater than or equal to Min_val=22382.729 and less than or equal to max_val=36901.75, it was assigned ticker=$ 在样本输出中,sal_val = 3098大于或等于Min_val = 22382.729且小于或等于max_val = 36901.75,已将其分配为报价器= $

I tried the following, 我尝试了以下方法

  df_values['ticker']=df_ticker.\
loc[((df_values['sal_val']>=df_ticker['Min_val'])| (df_values['sal_val']<=df_ticker['Max_val']))]['Ticker']
df_values  

It failed with error "ValueError: Can only compare identically-labeled Series objects" 它失败,并显示错误“ ValueError:只能比较标记相同的Series对象”

Any solutions for this issue? 这个问题有解决方案吗?

One way is to define a custom mapping function and use pd.Series.apply . 一种方法是定义自定义映射函数并使用pd.Series.apply

def mapper(x, t):
    if x < t['Min_val'].min():
        index = 0
    elif x >= t['Max_val'].max():
        index = -1
    else:
        index = next((idx for idx, (i, j) in enumerate(zip(t['Min_val'], t['Max_val']))\
                      if i <= x < j), None)

    return t['Ticker'].iloc[index] if index is not None else None

df_values['Ticker'] = df_values['sal_val'].apply(mapper, t=df_ticker)

Result 结果

   Id     sal_val Ticker
0   1    3098.000      $
1   2   45639.987     $$
2   3   65487.400   $$$$
3   4   56784.800    $$$
4   5       8.000      $
5   6  736455.000   $$$$

Explanation 说明

  • pd.Series.apply accepts a custom mapping function as an input. pd.Series.apply接受自定义映射功能作为输入。
  • The mapping function takes each entry in sal_val and compares it to values in df_ticker via an if / else structure. 映射函数获取sal_val每个条目,并通过if / else结构将其与df_ticker值进行比较。
  • The first 2 if statements deal with minimum and maximum boundaries. 前两个if语句处理最小和最大边界。
  • The final else statement uses a generator, which cycles through each row in df_ticker and finds the index of values where the input is within the range of Min_val and Max_val . 最后的else语句使用生成器,该生成器循环遍历df_ticker每一行,并找到输入在Min_valMax_val范围内的值的Max_val
  • Finally, we use the index and feed it into df_ticker['Ticker'] via .iloc integer accessor. 最后,我们使用索引,并通过.iloc整数访问器将其输入df_ticker['Ticker']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM