根据在另一个数据框中找到的范围在数据框中填充列

Question

我正在尝试根据该记录的索引值是否落在另一个数据帧中的两列定义的范围内来填充数据帧中的一列。

df1看起来像：

df2是：

  START STOP CLASS
0   2   3   1
1   5   7   2
2   8   8   3

我想要的样子：

    a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

df2中的START列是范围的最小值，而STOP列是最大值。

Answer 1

您可以使用IntervalIndex（需要v0.20.0）。

首先构造索引：

df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')

df2
Out: 
        START  STOP  CLASS
[2, 3]      2     3      1
[5, 7]      5     7      2
[8, 8]      8     8      3

现在，如果您索引到第二个DataFrame，它将在间隔中查找值。 例如，

df2.loc[6]
Out: 
START    5
STOP     7
CLASS    2
Name: [5, 7], dtype: int64

返回第二个类。 我不知道它是否可以与merge或merge_asof一起使用，但可以使用map作为替代：

df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])

请注意，我首先将索引转换为Series以便能够使用Series.map方法。 这导致

df1
Out: 
     a  CLASS
0    4    NaN
1   45    NaN
2    7    1.0
3    5    1.0
4   48    NaN
5   44    2.0
6   22    2.0
7   89    2.0
8   45    3.0
9   44    NaN
10  23    NaN

Answer 2

替代解决方案：

classdict = df2.set_index("CLASS").to_dict("index")

rangedict = {}

for key,value in classdict.items():

    # get all items in range and assign value (the key)
    for item in list(range(value["START"],value["STOP"]+1)):
        rangedict[item] = key

提取rangedict：

{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}

现在映射并可能使用format（？）：

df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)

输出：

a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

Answer 3

import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new

根据在另一个数据框中找到的范围在数据框中填充列

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-08-04 22:08:17

解决方案2
0 2017-08-04 22:21:06

解决方案3
0 2018-06-29 15:11:03

根据在另一个数据框中找到的范围在数据框中填充列

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-08-04 22:08:17

解决方案2 0 2017-08-04 22:21:06

解决方案3 0 2018-06-29 15:11:03

解决方案1
2 已采纳 2017-08-04 22:08:17

解决方案2
0 2017-08-04 22:21:06

解决方案3
0 2018-06-29 15:11:03