繁体   English   中英

以分类列为条件的特征工程薪资数据

[英]Feature Engineering Salary Data using Categorical Column as a condition

考虑到分类列,需要将工资金额转换为年化工资:

  • 'M' - 每月
  • 'Y' - 每年
  • 'W' - 每周
  • 'B' - 双周刊
df = pd.DataFrame({'Name':['A','B','C','D','E'],
                  'sal_amt':[4500,50000,2000,3000,5000],
                  'sal_md':['M','Y','W','B','M']})
df.head()

#defined a function for my problem...

def func(row):
    if row['sal_md'] == 'M':
        return (row['sal_amt']*12)
    elif row['sal_md'] =='Y':
        return row['sal_amt'] 
    elif row['sal_md'] == 'H':
        return (row['sal_amt']*8760)
    elif row['sal_md'] == 'W':
        return (row['sal_amt']*52)
    elif row['sal_md'] == 'B':
        return (row['sal_amt']*26)
    elif row['sal_md'] == 'S':
        return row['sal_amt']
    elif row['sal_md'] == 'A':
        return row['sal_amt']


df['sal_annual'] = df.apply(func,axis=1)

https://i.stack.imgur.com/INXva.png

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'Name':['A','B','C','D','E'],
                      'sal_amt':[4500,50000,2000,3000,5000],
                      'sal_md':['M','Y','W','B','M']})

In [3]: multiplier_dict = {'M':12, 'Y':1, 'W':52, 'B':26}

In [4]: df['sal_multiplier'] = df.sal_md.map(multiplier_dict)

In [5]: df['sal_annual'] = df.sal_amt*df.sal_multiplier

In [6]: df.head()
Out[6]:
  Name  sal_amt sal_md  sal_multiplier  sal_annual
0    A     4500      M              12       54000
1    B    50000      Y               1       50000
2    C     2000      W              52      104000
3    D     3000      B              26       78000
4    E     5000      M              12       60000

不完全是你问的问题,而是以一种简单和 Pythonic 的方式完全解决你的问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM