Good day everyone. I want to extract the last digit numbers after the slash symbol from the project_name
column. Currently, I'm working on it but have some issues as follow:
My code:
def project_name(name):
return re.findall(r'\d{3}$',name)
data['project_name'] = data['project_name'].apply(project_name)
The data:
project_name
----------
ASAHI,PT-PRO/PTN/06-2012/192
CIMB NIAGA-PRO/PTN/06-2012/174
FRAMAS INDONESIA-PRO/PTN/06-2012/210
DM STOCK 2015
PERBAIKAN OH TM 366 PLANT DAWUAN
Ruko-PRO/PTN/03-2012/47
My output:
(Expected)project_name
----------
192
174
210
NaN
NaN
NaN
47
All advice and input are appreciated. Thanks everyone
Use Series.str.extract
and add /
to regex:
data['project_name'] = data['project_name'].str.extract(r'/(\d{3}$)')
print (data)
project_name
0 192
1 174
2 210
3 NaN
4 NaN
5 NaN
6 NaN
Solution with findall
:
data['project_name'] = data['project_name'].str.findall(r'/(\d{3}$)').str[0]
And your solution should be change with next
and iter
for return default value np.nan
if no match:
def project_name(name):
return next(iter(re.findall(r'/(\d{3})$',name)), np.nan)
data['project_name'] = data['project_name'].apply(project_name)
print (data)
project_name
0 192
1 174
2 210
3 NaN
4 NaN
5 NaN
6 NaN
instead of
def project_name(name):
return re.findall(r'\d{3}$',name)
use
def project_name(name):
return re.findall(r'\d{3}$',name)[0]
As the value in list is only one, we can return the value of 0th
index
def project_name(name):
return re.findall(r'\d{3}$',name)[0]
data['project_name'] = data['project_name'].apply(project_name)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.