I have an excel file that I am trying to extract information from, specifically from the 5th column ('Summary'). Each element of 'Summary' is a string; I am trying to find the age of the person mentioned. The age info will either be "John Smith, 23," or "John Smith, a 23-year-old". The first row of data isn't valid, so I skipped it. I can't figure out what I am doing wrong. I am searching for 2 digits followed by either a comma or a dash, then trying to return the 2 digits. Thanks for the help
import pandas as pd
mf=pd.ExcelFile(myFile)
m=mf.parse('myDataFile')
age = []
s = m['Summary']
for a in s[1:]:
x = re.search('[0-9]{2}',a)
y=x+1
age.append(a[x,y])
I didn't realize the regular expression returned a "match" object, not an integer. I was able to get it to work using this:
s=m['Summary']
age =[]
n=re.compile('\d\d(,|-)')
for t in s:
x= re.search(n,t)
if x:
b=x.group(0)
age.append(b[:2])
else:
age.append("NA")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.