繁体   English   中英

根据在另一个数据框中找到的范围在数据框中填充列

[英]Populate column in data frame based on a range found in another dataframe

我正在尝试根据该记录的索引值是否落在另一个数据帧中的两列定义的范围内来填充数据帧中的一列。

df1看起来像:

    a
0   4
1   45
2   7
3   5
4   48
5   44
6   22
7   89
8   45
9   44
10  23

df2是:

  START STOP CLASS
0   2   3   1
1   5   7   2
2   8   8   3

我想要的样子:

    a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan

df2中的START列是范围的最小值,而STOP列是最大值。

您可以使用IntervalIndex(需要v0.20.0)。

首先构造索引:

df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')

df2
Out: 
        START  STOP  CLASS
[2, 3]      2     3      1
[5, 7]      5     7      2
[8, 8]      8     8      3

现在,如果您索引到第二个DataFrame,它将在间隔中查找值。 例如,

df2.loc[6]
Out: 
START    5
STOP     7
CLASS    2
Name: [5, 7], dtype: int64

返回第二个类。 我不知道它是否可以与merge或merge_asof一起使用,但可以使用map作为替代:

df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])

请注意,我首先将索引转换为Series以便能够使用Series.map方法。 这导致

df1
Out: 
     a  CLASS
0    4    NaN
1   45    NaN
2    7    1.0
3    5    1.0
4   48    NaN
5   44    2.0
6   22    2.0
7   89    2.0
8   45    3.0
9   44    NaN
10  23    NaN

替代解决方案:


classdict = df2.set_index("CLASS").to_dict("index")

rangedict = {}

for key,value in classdict.items():

    # get all items in range and assign value (the key)
    for item in list(range(value["START"],value["STOP"]+1)):
        rangedict[item] = key

提取rangedict:

{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}

现在映射并可能使用format(?):

df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)

输出:

a   CLASS
0   4   nan
1   45  nan
2   7   1
3   5   1
4   48  nan
5   44  2
6   22  2
7   89  2
8   45  3
9   44  nan
10  23  nan
import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM