简体   繁体   English

从 dataframe 中提取特征

[英]Extracting features from dataframe

I have pandas dataframe like this我有这样的 pandas dataframe

    ID  Phone          ex

0   1   5333371000     533
1   2   5354321938     535
2   3   3840812        384
3   4   5451215        545
4   5   2125121278     212

For example if "ex" start to 533,535,545 new variable should be:例如,如果“ex”开始到 533,535,545 新变量应该是:

Sample output:样本 output:

   ID    Phone         ex          iswhat

0   1   5333371000     533         personal
1   2   5354321938     535         personal
2   3   3840812        384         notpersonal
3   4   5451215        545         personal
4   5   2125121278     212         notpersonal

How can i do that?我怎样才能做到这一点?

We can use np.where along with str.contains :我们可以将np.wherestr.contains一起使用:

df["iswhat"] = np.where(df["ex"].str.contains(r'^(?:533|535|545)$'),
                        'personal', 'notpersonal')

You can use np.where :您可以使用np.where

df['iswhat'] = np.where(df['ex'].isin([533, 535, 545]), 'personal', 'not personal')
print(df)

# Output
   ID       Phone   ex        iswhat
0   1  5333371000  533      personal
1   2  5354321938  535      personal
2   3     3840812  384  not personal
3   4     5451215  545      personal
4   5  2125121278  212  not personal

Update更新

You can also use your Phone column directly:您也可以直接使用Phone列:

df['iswhat'] = np.where(df['Phone'].astype(str).str.match('533|535|545'), 
                        'personal', 'not personal')

Note: If Phone column contains strings you can safely remove .astype(str) .注意:如果Phone列包含字符串,您可以安全地删除.astype(str)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM