Extracting the project number with python regex

Question

Good day everyone. I want to extract the last digit numbers after the slash symbol from the project_name column. Currently, I'm working on it but have some issues as follow:

How could I extract the last digit numbers after the slash symbol without getting the result with a square bracket in it? Because right now I have the code that almost works but the result always has a square bracket in it

My code:

def project_name(name):
    return re.findall(r'\d{3}$',name)

data['project_name'] = data['project_name'].apply(project_name)

The data:

project_name    
 ----------
   ASAHI,PT-PRO/PTN/06-2012/192          
   CIMB NIAGA-PRO/PTN/06-2012/174        
   FRAMAS INDONESIA-PRO/PTN/06-2012/210    
   DM STOCK 2015   
   PERBAIKAN OH TM 366 PLANT DAWUAN 
   Ruko-PRO/PTN/03-2012/47

My output:

 (Expected)project_name   
 ----------     
   192            
   174            
   210            
   NaN
   NaN            
   NaN            
    47

All advice and input are appreciated. Thanks everyone

Answer 1

Use Series.str.extract and add / to regex:

data['project_name'] = data['project_name'].str.extract(r'/(\d{3}$)')
print (data)
  project_name
0          192
1          174
2          210
3          NaN
4          NaN
5          NaN
6          NaN

Solution with findall :

data['project_name'] = data['project_name'].str.findall(r'/(\d{3}$)').str[0]

And your solution should be change with next and iter for return default value np.nan if no match:

def project_name(name):
    return next(iter(re.findall(r'/(\d{3})$',name)), np.nan)

data['project_name'] = data['project_name'].apply(project_name)
print (data)
  project_name
0          192
1          174
2          210
3          NaN
4          NaN
5          NaN
6          NaN

Answer 2

instead of

def project_name(name):
    return re.findall(r'\d{3}$',name)

use

def project_name(name):
    return re.findall(r'\d{3}$',name)[0]

Answer 3

As the value in list is only one, we can return the value of 0th index

def project_name(name):
    return re.findall(r'\d{3}$',name)[0]

data['project_name'] = data['project_name'].apply(project_name)

Extracting the project number with python regex

Question

3 answers

solution1
1 ACCPTED 2019-12-16 06:40:16

solution2
0 2019-12-16 06:32:57

solution3
0 2019-12-16 07:09:22

Extracting the project number with python regex

Question

3 answers

solution1 1 ACCPTED 2019-12-16 06:40:16

solution2 0 2019-12-16 06:32:57

solution3 0 2019-12-16 07:09:22

solution1
1 ACCPTED 2019-12-16 06:40:16

solution2
0 2019-12-16 06:32:57

solution3
0 2019-12-16 07:09:22