使用python pandas如何进行一些分析以识别有效的手机号码

Question

I have got daily MIS Fields are " Name,Number and Location ". 我每天都有MIS字段是“名称，号码和位置”。 Now, Total I have 100 Rows data daily basis and I have to first check that the numbers are in 10 digit or not, if Number fields are 1 to 9 digit , i have to remove that entry in my MIS, 现在，Total我每天有100行数据，我必须首先检查数字是否是10位数，如果数字字段是1到9位数，我必须删除我的MIS中的那个条目，

only valid number like 10 digit and additional +91 before 10 digit number is valid. 只有10位数字的有效数字和10位数字之前的额外+91有效。 so, in excel i have to daily remove that invalid numbers and all its manually. 所以，在excel我必须每天手动删除无效的数字及其全部。

next i have to send it to valid number in 2 branches.50% valid number in 1st branch and 50% valid number in 2nd branch, 接下来我必须将它发送到2个分支机构的有效号码。第一个分支机构的有效数字为50％，第二个分支机构的有效数量为50％，

In 1st branch there are two persons, so again I have to send to both person equally valid number data entry. 在第一个分支中有两个人，所以我必须再向两个人发送同等有效的数字数据条目。 So, For example : if out of 100 data rows, total valid number is 60 , Then in 1st branch total 30 valid numbers occurs, and each two person get 15-15 numbers. 因此，例如：如果100个数据行中，总有效数为60，那么在第1个分支中总共发生30个有效数，每个人得到15-15个数。

In 2nd branch there are three persons, valid 30 numbers occurs and each three get 10-10-10 numbers. 在第二个分支中有三个人，有效的30个数字出现，每个三个得到10-10-10个数字。

Any help it will grateful. 任何帮助都会感激不尽。

Here is my code. 这是我的代码。

import pandas as pd
import numpy as np
df = pd.read_csv('/home/desktop/Desktop/MIS.csv')
df
      Name        Number Location
0   Jayesh        980000     Pune
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai
6   Naresh          9000     Pune


df['Number']=df['Number'].astype(str).apply(lambda x: np.where((len(x)<=10)))

Answer 1

Use - 采用 -

df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})')

Output 产量

0    False
1     True
2     True
3     True
4     True
5     True
6    False
Name: Number, dtype: bool

Update 更新

Use this bool series to filter - 使用此bool系列过滤 -

df_filtered = df[df['Number'].astype(str).str.match(r'(\+)*(91)*(\d{10})', as_indexer=True)]


Name    Number  Location
1   Ajay    9890989090  Mumbai
2   Manish  9999999999  Pune
3   Vikram  919000000000    Pune
4   Prakash 919999999999    Mumbai
5   Rakesh  919999999998    Mumbai

Answer 2

It's tempting to convert your numbers to strings and then perform your comparisons. 将您的数字转换为字符串然后执行比较很有吸引力。 However, this isn't necessary and will typically be inefficient. 但是，这不是必需的，并且通常效率低下。 You can use regular Boolean comparisons with a direct algorithm: 您可以使用常规布尔比较和直接算法：

m1 = (np.log10(df['Number']).astype(int) + 1) == 12
m2 = (df['Number'] // 10**10) == 91

df_filtered = df[m1 & m2]

print(df_filtered)

      Name        Number Location
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

Answer 3

用于将nan分配给不以91开头且小于10位的str：

df.Number[(~df.Number.str.startswith('91',na=False))&[len(df.Number[i])!= 10 for i in df.index]] = np.nan

Answer 4

If the data corresponds likely as given in example then below should work for you as per your requirement. 如果数据可能与示例中给出的相符，则下面的内容应根据您的要求适用于您。

DataFrame: 数据帧：

>>> df
      Name        Number Location
0   Jayesh        980000     Pune
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai
6   Naresh          9000     Pune

Result: 结果：

using str.match : 使用str.match ：

>>> df[df.Number.astype(str).str.match(r'^(\d{10}|\d{12})$', as_indexer=True)]
      Name        Number Location
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

OR 要么

>>> df[df.Number.astype(str).str.match(r'^[0-9]{10,12}$', as_indexer=True)]
      Name        Number Location
1     Ajay    9890989090   Mumbai
2   Manish    9999999999     Pune
3   Vikram  919000000000     Pune
4  Prakash  919999999999   Mumbai
5   Rakesh  919999999998   Mumbai

Answer 5

I suggest to use the following regex pattern: 我建议使用以下正则表达式模式：

^\\+91\\d{10}$|^91\\d{10}$|^\\d{10}$

This is assuming there are no spaces and/or brackets in your Number column. 这是假设您的Number列中没有空格和/或括号。 The pattern makes sure the digit part is always 10 long (no more no less) and lets it be preceded by either +91 or 91. 该模式确保数字部分始终为10长（不多于不少），并使其前面加上+91或91。

to build a filtered dataframe you would then: 要构建过滤后的数据框，您将：

dff = df[df['Number'].astype(str).str.match(r'^\\+91\\d{10}$|^91\\d{10}$|^\\d{10}$')]

使用python pandas如何进行一些分析以识别有效的手机号码

问题描述

5 个解决方案

解决方案1
3 2018-12-29 07:48:14

解决方案2
3 2018-12-29 12:13:15

解决方案3
1 2018-12-29 07:26:00

解决方案4
1 2018-12-29 11:57:44

解决方案5
0 2018-12-29 14:52:17

使用python pandas如何进行一些分析以识别有效的手机号码

问题描述

5 个解决方案

解决方案1 3 2018-12-29 07:48:14

解决方案2 3 2018-12-29 12:13:15

解决方案3 1 2018-12-29 07:26:00

解决方案4 1 2018-12-29 11:57:44

解决方案5 0 2018-12-29 14:52:17

解决方案1
3 2018-12-29 07:48:14

解决方案2
3 2018-12-29 12:13:15

解决方案3
1 2018-12-29 07:26:00

解决方案4
1 2018-12-29 11:57:44

解决方案5
0 2018-12-29 14:52:17