I'm attempting to populate a column in a data frame based on whether the index value of that record falls within a range defined by two columns in another data frame.
df1 looks like:
a
0 4
1 45
2 7
3 5
4 48
5 44
6 22
7 89
8 45
9 44
10 23
and df2 is:
START STOP CLASS
0 2 3 1
1 5 7 2
2 8 8 3
what I want would look like:
a CLASS
0 4 nan
1 45 nan
2 7 1
3 5 1
4 48 nan
5 44 2
6 22 2
7 89 2
8 45 3
9 44 nan
10 23 nan
The START column in df2 is the minimum value of the range and the STOP column is the max.
You can use IntervalIndex (requires v0.20.0).
First construct the index:
df2.index = pd.IntervalIndex.from_arrays(df2['START'], df2['STOP'], closed='both')
df2
Out:
START STOP CLASS
[2, 3] 2 3 1
[5, 7] 5 7 2
[8, 8] 8 8 3
Now if you index into the second DataFrame it will lookup the value in the intervals. For example,
df2.loc[6]
Out:
START 5
STOP 7
CLASS 2
Name: [5, 7], dtype: int64
returns the second class. I don't know if it can be used with merge or with merge_asof but as an alternative you can use map:
df1['CLASS'] = df1.index.to_series().map(df2['CLASS'])
Note that I first converted the index to a Series to be able to use the Series.map method. This results in
df1
Out:
a CLASS
0 4 NaN
1 45 NaN
2 7 1.0
3 5 1.0
4 48 NaN
5 44 2.0
6 22 2.0
7 89 2.0
8 45 3.0
9 44 NaN
10 23 NaN
Alternative solution:
classdict = df2.set_index("CLASS").to_dict("index")
rangedict = {}
for key,value in classdict.items():
# get all items in range and assign value (the key)
for item in list(range(value["START"],value["STOP"]+1)):
rangedict[item] = key
extract rangedict:
{2: 1, 3: 1, 5: 2, 6: 2, 7: 2, 8: 3}
now map and possibly format(?):
df1['CLASS'] = df1.index.to_series().map(rangedict)
df1.applymap("{0:.0f}".format)
outputs:
a CLASS
0 4 nan
1 45 nan
2 7 1
3 5 1
4 48 nan
5 44 2
6 22 2
7 89 2
8 45 3
9 44 nan
10 23 nan
import pandas as pd import numpy as np # Here is your existing dataframe df_existing = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD')) # Create a new empty dataframe with specific column names and data types df_new = pd.DataFrame(index=None) columns = ['field01','field02','field03','field04'] dtypes = [str,int,int,int] for c,d in zip(columns, dtypes): df_new[c] = pd.Series(dtype=d) # Set the index on the new dataframe to same as existing df_new['new_index'] = df_existing.index df_new.set_index('new_index', inplace=True) # Fill the new dataframe with specific fields from the existing dataframe df_new[['field02','field03']] = df_existing[['B','C']] print df_new
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.