简体   繁体   English

根据在另一个数据框中找到的范围在数据框中填充列

[英]Populate column in data frame based on a range found in another dataframe

I'm attempting to populate a column in a data frame based on whether the index value of that record falls within a range defined by two columns in another data frame. 我正在尝试根据该记录的索引值是否落在另一个数据帧中的两列定义的范围内来填充数据帧中的一列。

df1 looks like: df1看起来像:

    a
0   4
1   45
2   7
3   5
4   48
5   44
6   22
7   89
8   45
9   44
10  23

and df2 is: df2是:

  START STOP CLASS
0   2   3   1
1   5   7   2
2   8   8   3

what I want would look like: 我想要的样子:

    a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

The START column in df2 is the minimum value of the range and the STOP column is the max. df2中的START列是范围的最小值,而STOP列是最大值。

You can use IntervalIndex (requires v0.20.0). 您可以使用IntervalIndex(需要v0.20.0)。

First construct the index: 首先构造索引:

df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')

df2
Out: 
        START  STOP  CLASS
[2, 3]      2     3      1
[5, 7]      5     7      2
[8, 8]      8     8      3

Now if you index into the second DataFrame it will lookup the value in the intervals. 现在,如果您索引到第二个DataFrame,它将在间隔中查找值。 For example, 例如,

df2.loc[6]
Out: 
START    5
STOP     7
CLASS    2
Name: [5, 7], dtype: int64

returns the second class. 返回第二个类。 I don't know if it can be used with merge or with merge_asof but as an alternative you can use map: 我不知道它是否可以与merge或merge_asof一起使用,但可以使用map作为替代:

df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])

Note that I first converted the index to a Series to be able to use the Series.map method. 请注意,我首先将索引转换为Series以便能够使用Series.map方法。 This results in 这导致

df1
Out: 
     a  CLASS
0    4    NaN
1   45    NaN
2    7    1.0
3    5    1.0
4   48    NaN
5   44    2.0
6   22    2.0
7   89    2.0
8   45    3.0
9   44    NaN
10  23    NaN

Alternative solution: 替代解决方案:


classdict = df2.set_index("CLASS").to_dict("index")

rangedict = {}

for key,value in classdict.items():

    # get all items in range and assign value (the key)
    for item in list(range(value["START"],value["STOP"]+1)):
        rangedict[item] = key

extract rangedict: 提取rangedict:

{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}

now map and possibly format(?): 现在映射并可能使用format(?):

df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)

outputs: 输出:

a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan
import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据在另一列和公共列中找到的范围在数据框中创建一个新列 - Create a new column in data frame based on the range found in another column and a common column 根据另一个数据框对数据框的列进行分组 - Group a column of a data frame based on another dataframe Python / pandas:创建数据框的列并根据在另一个 dataframe 范围内找到列值来设置其值 - Python / pandas: create a data frame's column and set it's value based on finding a column value in range of another dataframe Pandas:根据另一个数据框列中的值范围计算单独数据框列框中的值(python) - Pandas: Calculating a value in a separate data frame column frame based on range of values in another data frame column (python) 根据数据框中另一列中的值在数据框中应用lambda - Apply lambda in data frame based on values in another column in the dataframe 如何根据另一列中的数据填充 dataframe 中的列并在 python 中的另一列上进行条件/切换 - How to populate a column in a dataframe based on data in another column and condition /switch on another column in python Pandas - 根据另一个填充一个数据框列 - Pandas - populate one dataframe column based on another 根据另一个 dataframe 中列的 if 语句填充 dataframe 中的列 - Python - Populate a column in a dataframe based on if statement for column in another dataframe - Python Pandas 根据另一个数据框中的匹配列填充新的数据框列 - Pandas populate new dataframe column based on matching columns in another dataframe 根据另一个 dataframe 的匹配行和列填充 dataframe 中的值 - Populate values in a dataframe based on matching row and column of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM