熊猫DataFrame检查其他列中的列值

Question

I have test_df with columns 'MonthAbbr' and 'PromoInterval' 我有test_df列'MonthAbbr'和'PromoInterval'

Example output 输出示例

1017174           Jun  Mar,Jun,Sept,Dec
1017175           Mar  Mar,Jun,Sept,Dec
1017176           Feb  Mar,Jun,Sept,Dec
1017177           Feb  Feb,May,Aug,Nov
1017178           Jan  Feb,May,Aug,Nov
1017179           Jan  Mar,Jun,Sept,Dec
1017180           Jan  Mar,Jun,Sept,Dec

I want add column-indicator is month in promo interval, which will =1 if MonthAbbr in PromoInterval for current row, =0 otherwise 我想在促销间隔中添加column-indicator是月份，如果当前行在PromoInterval中为MonthAbbr，则它将为= 1，否则为= 0

Is there more efficient way? 有没有更有效的方法？

for ind in test_df.index:
  test_df.set_value(ind ,'IsPromoInThisMonth',
  test_df.MonthAbbr.astype(str)[ind] in (test_df.PromoInterval.astype(str)[ind])

Answer 1

This is a bit faster: 这有点快：

%%timeit
test_df['IsPromoInThisMonth'] = [x in y for x, y in zip(test_df['MonthAbbr'], 
                                                        test_df['PromoInterval'])]

1000 loops, best of 3: 317 µs per loop

Than your approach: 比您的方法：

%%timeit
for ind in test_df.index:
    test_df.set_value(ind ,'IsPromoInThisMonth',
    test_df.MonthAbbr.astype(str)[ind] in (test_df.PromoInterval.astype(str)[ind]))
1000 loops, best of 3: 1.44 ms per loop

UPDATE 更新

Using a function with apply is slower than the list comprehension: 将函数与apply一起apply比列表理解要慢：

%%timeit
test_df['IsPromoInThisMonth'] = test_df.apply(lambda x: x[0] in x[1], axis=1)

1000 loops, best of 3: 804 µs per loop

熊猫DataFrame检查其他列中的列值

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-10-31 11:29:17

熊猫DataFrame检查其他列中的列值

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-10-31 11:29:17

解决方案1
0 已采纳 2015-10-31 11:29:17