简体   繁体   English

根据来自另一个数据帧的行中的范围添加/填充 Pandas 列

[英]Add/fill pandas column based on range in rows from another dataframe

Working with pandas, I have df1 indexed by time samples:使用熊猫,我有 df1 按时间样本索引:

data = '''\
time       flags    input                  
8228835.0  53153.0  32768.0
8228837.0  53153.0  32768.0
8228839.0  53153.0  32768.0
8228841.0  53153.0  32768.0
8228843.0  61345.0  32768.0'''

fileobj = pd.compat.StringIO(data)
df1 = pd.read_csv(fileobj, sep='\s+', index_col='time')

df2 indicates time ranges with start and end to define ranges where the state of 'check' is True: df2 表示带有开始和结束的时间范围,以定义“检查”状态为 True 的范围:

data = '''\
        check     start       end
20536   True   8228837   8228993
20576   True   8232747   8232869
20554   True   8230621   8230761
20520   True   8227351   8227507
20480   True   8223549   8223669
20471   True   8221391   8221553'''

fileobj = pd.compat.StringIO(data)
df2 = pd.read_csv(fileobj, sep='\s+')

What I need to do is add a column for 'check' to df1 and fill out the actual time ranges defined in df2 with the value of True.我需要做的是向 df1 添加一列“检查”,并使用 True 值填写 df2 中定义的实际时间范围。 All others should be False.所有其他人都应该是假的。 An example result would be:一个示例结果是:

             flags    input    check
time                       
8228835.0  53153.0  32768.0    False
8228837.0  53153.0  32768.0    True
8228839.0  53153.0  32768.0    True
8228841.0  53153.0  32768.0    True
8228843.0  61345.0  32768.0    True
....
8228994.0. 12424.0. 32768.0.   False

You can make a list or ranges, and then use pd.Index.isin with itertools.chain :您可以创建一个列表或范围,然后将pd.Index.isinitertools.chain pd.Index.isin使用:

from itertools import chain

df2 = df2[df2['check']]

ranges = map(range, df2['start'], df2['end'])

df1['check'] = df1.index.isin(chain.from_iterable(ranges))

print(df1)

             flags    input  check
time                              
8228835.0  53153.0  32768.0  False
8228837.0  53153.0  32768.0   True
8228839.0  53153.0  32768.0   True
8228841.0  53153.0  32768.0   True
8228843.0  61345.0  32768.0   True

I think you can using IntervalIndex with loc我认为您可以将IntervalIndexloc

df2.index=pd.IntervalIndex.from_arrays(df2.start,df2.end,'both')
df2.loc[df.index]
Out[174]: 
        check  start  end
[1, 2]   True      1    2
[4, 5]   True      4    5
[7, 8]   True      7    8
df['newcol']=df2.loc[df.index].check.values.tolist()
df
Out[176]: 
       flags    input  newcol
flags                        
2          2  32768.0    True
4          4  32768.0    True
7          7  32768.0    True

A list comprehension using any() .使用any()列表理解。 No clue about the actual performance though, would be nice if you could run the %timings for us!虽然不知道实际性能,但如果您能为我们运行 %timings 就好了!

df1['check'] = [any(start <= i <= end for start,end in 
                    zip(df2['start'], df2['end'])) for i in df1.index]

print(df1)

Returns:返回:

             flags    input  check
time                              
8228835.0  53153.0  32768.0  False
8228837.0  53153.0  32768.0   True
8228839.0  53153.0  32768.0   True
8228841.0  53153.0  32768.0   True
8228843.0  61345.0  32768.0   True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个的范围和类别填充 pandas 列 - Fill pandas column based on range and category of another 根据范围内日期列的月份从 Pandas DataFrame 检索行 - Retrieving rows from Pandas DataFrame based on month of a date column in a range 根据给定列 pandas 中的缺失值,将行从一个 dataframe 添加到另一个 - Add rows from one dataframe to another based on missing values in a given column pandas 如何根据另一列的时间将列添加到pandas数据框 - How to add a column to pandas dataframe based on time from another column 如何根据从另一列之间选择熊猫数据框中的行 - how to select rows in pandas dataframe based on between from another column Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column Pandas基于连接将列从一个数据帧添加到另一个数据帧 - Pandas add column from one dataframe to another based on a join 熊猫用另一个数据框范围内的值计数填充数据框 - Pandas fill dataframe with count of values within a range from another dataframe 根据另一列的重复填充 pandas dataframe 列 - Fill pandas dataframe column based on duplication of another column 根据来自另一个熊猫数据框的列在熊猫数据框中创建新行 - Create new rows in a Pandas Dataframe based on a column from another pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM