从 dataframe 中提取特征

Question

I have pandas dataframe like this我有这样的 pandas dataframe

    ID  Phone          ex

0   1   5333371000     533
1   2   5354321938     535
2   3   3840812        384
3   4   5451215        545
4   5   2125121278     212

For example if "ex" start to 533,535,545 new variable should be:例如，如果“ex”开始到 533,535,545 新变量应该是：

Sample output:样本 output：

   ID    Phone         ex          iswhat

0   1   5333371000     533         personal
1   2   5354321938     535         personal
2   3   3840812        384         notpersonal
3   4   5451215        545         personal
4   5   2125121278     212         notpersonal

How can i do that?我怎样才能做到这一点？

Answer 1

We can use np.where along with str.contains :我们可以将np.where与str.contains一起使用：

df["iswhat"] = np.where(df["ex"].str.contains(r'^(?:533|535|545)$'),
                        'personal', 'notpersonal')

Answer 2

You can use np.where :您可以使用np.where ：

df['iswhat'] = np.where(df['ex'].isin([533, 535, 545]), 'personal', 'not personal')
print(df)

# Output
   ID       Phone   ex        iswhat
0   1  5333371000  533      personal
1   2  5354321938  535      personal
2   3     3840812  384  not personal
3   4     5451215  545      personal
4   5  2125121278  212  not personal

Update更新

You can also use your Phone column directly:您也可以直接使用Phone列：

df['iswhat'] = np.where(df['Phone'].astype(str).str.match('533|535|545'), 
                        'personal', 'not personal')

Note: If Phone column contains strings you can safely remove .astype(str) .注意：如果Phone列包含字符串，您可以安全地删除.astype(str) 。

从 dataframe 中提取特征

问题描述

2 个解决方案

解决方案1
2 2022-03-11 07:05:14

解决方案2
2 已采纳 2022-03-11 07:05:36

从 dataframe 中提取特征

问题描述

2 个解决方案

解决方案1 2 2022-03-11 07:05:14

解决方案2 2 已采纳 2022-03-11 07:05:36

解决方案1
2 2022-03-11 07:05:14

解决方案2
2 已采纳 2022-03-11 07:05:36