根据在另一个数据框中找到的范围在数据框中填充列

Question

I'm attempting to populate a column in a data frame based on whether the index value of that record falls within a range defined by two columns in another data frame. 我正在尝试根据该记录的索引值是否落在另一个数据帧中的两列定义的范围内来填充数据帧中的一列。

df1 looks like: df1看起来像：

and df2 is: df2是：

  START STOP CLASS
0   2   3   1
1   5   7   2
2   8   8   3

what I want would look like: 我想要的样子：

    a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

The START column in df2 is the minimum value of the range and the STOP column is the max. df2中的START列是范围的最小值，而STOP列是最大值。

Answer 1

You can use IntervalIndex (requires v0.20.0). 您可以使用IntervalIndex（需要v0.20.0）。

First construct the index: 首先构造索引：

df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')

df2
Out: 
        START  STOP  CLASS
[2, 3]      2     3      1
[5, 7]      5     7      2
[8, 8]      8     8      3

Now if you index into the second DataFrame it will lookup the value in the intervals. 现在，如果您索引到第二个DataFrame，它将在间隔中查找值。 For example, 例如，

df2.loc[6]
Out: 
START    5
STOP     7
CLASS    2
Name: [5, 7], dtype: int64

returns the second class. 返回第二个类。 I don't know if it can be used with merge or with merge_asof but as an alternative you can use map: 我不知道它是否可以与merge或merge_asof一起使用，但可以使用map作为替代：

df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])

Note that I first converted the index to a Series to be able to use the Series.map method. 请注意，我首先将索引转换为Series以便能够使用Series.map方法。 This results in 这导致

df1
Out: 
     a  CLASS
0    4    NaN
1   45    NaN
2    7    1.0
3    5    1.0
4   48    NaN
5   44    2.0
6   22    2.0
7   89    2.0
8   45    3.0
9   44    NaN
10  23    NaN

Answer 2

Alternative solution: 替代解决方案：

classdict = df2.set_index("CLASS").to_dict("index")

rangedict = {}

for key,value in classdict.items():

    # get all items in range and assign value (the key)
    for item in list(range(value["START"],value["STOP"]+1)):
        rangedict[item] = key

extract rangedict: 提取rangedict：

{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}

now map and possibly format(?): 现在映射并可能使用format（？）：

df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)

outputs: 输出：

a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

Answer 3

import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new

根据在另一个数据框中找到的范围在数据框中填充列

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-08-04 22:08:17

解决方案2
0 2017-08-04 22:21:06

解决方案3
0 2018-06-29 15:11:03

根据在另一个数据框中找到的范围在数据框中填充列

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-08-04 22:08:17

解决方案2 0 2017-08-04 22:21:06

解决方案3 0 2018-06-29 15:11:03

解决方案1
2 已采纳 2017-08-04 22:08:17

解决方案2
0 2017-08-04 22:21:06

解决方案3
0 2018-06-29 15:11:03