![](/img/trans.png)
[英]Create a new column in data frame based on the range found in another column and a common column
[英]Populate column in data frame based on a range found in another dataframe
我正在尝试根据该记录的索引值是否落在另一个数据帧中的两列定义的范围内来填充数据帧中的一列。
df1看起来像:
a
0 4
1 45
2 7
3 5
4 48
5 44
6 22
7 89
8 45
9 44
10 23
df2是:
START STOP CLASS
0 2 3 1
1 5 7 2
2 8 8 3
我想要的样子:
a CLASS
0 4 nan
1 45 nan
2 7 1
3 5 1
4 48 nan
5 44 2
6 22 2
7 89 2
8 45 3
9 44 nan
10 23 nan
df2中的START列是范围的最小值,而STOP列是最大值。
您可以使用IntervalIndex(需要v0.20.0)。
首先构造索引:
df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')
df2
Out:
START STOP CLASS
[2, 3] 2 3 1
[5, 7] 5 7 2
[8, 8] 8 8 3
现在,如果您索引到第二个DataFrame,它将在间隔中查找值。 例如,
df2.loc[6]
Out:
START 5
STOP 7
CLASS 2
Name: [5, 7], dtype: int64
返回第二个类。 我不知道它是否可以与merge或merge_asof一起使用,但可以使用map作为替代:
df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])
请注意,我首先将索引转换为Series以便能够使用Series.map方法。 这导致
df1
Out:
a CLASS
0 4 NaN
1 45 NaN
2 7 1.0
3 5 1.0
4 48 NaN
5 44 2.0
6 22 2.0
7 89 2.0
8 45 3.0
9 44 NaN
10 23 NaN
替代解决方案:
classdict = df2.set_index("CLASS").to_dict("index")
rangedict = {}
for key,value in classdict.items():
# get all items in range and assign value (the key)
for item in list(range(value["START"],value["STOP"]+1)):
rangedict[item] = key
提取rangedict:
{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}
现在映射并可能使用format(?):
df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)
输出:
a CLASS
0 4 nan
1 45 nan
2 7 1
3 5 1
4 48 nan
5 44 2
6 22 2
7 89 2
8 45 3
9 44 nan
10 23 nan
import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.