应用于列表的 Python 正则表达式

Question

I have an excel file that I am trying to extract information from, specifically from the 5th column ('Summary').我有一个 excel 文件，我试图从中提取信息，特别是从第 5 列（“摘要”）中提取信息。 Each element of 'Summary' is a string; 'Summary' 的每个元素都是一个字符串； I am trying to find the age of the person mentioned.我试图找到所提到的人的年龄。 The age info will either be "John Smith, 23," or "John Smith, a 23-year-old".年龄信息将是“约翰·史密斯，23 岁”或“约翰·史密斯，23 岁”。 The first row of data isn't valid, so I skipped it.第一行数据无效，所以我跳过了它。 I can't figure out what I am doing wrong.我无法弄清楚我做错了什么。 I am searching for 2 digits followed by either a comma or a dash, then trying to return the 2 digits.我正在搜索 2 位数字，后跟逗号或破折号，然后尝试返回 2 位数字。 Thanks for the help谢谢您的帮助

import pandas as pd

mf=pd.ExcelFile(myFile)

m=mf.parse('myDataFile')

age = []
s = m['Summary']

for a in s[1:]:
    x = re.search('[0-9]{2}',a)
    y=x+1
    age.append(a[x,y])

Answer 1

I didn't realize the regular expression returned a "match" object, not an integer.我没有意识到正则表达式返回的是“匹配”对象，而不是整数。 I was able to get it to work using this:我能够使用它让它工作：

s=m['Summary']
age =[]
n=re.compile('\d\d(,|-)')

for t in s:
    x= re.search(n,t)
    if x:
        b=x.group(0)
        age.append(b[:2])
    else:
        age.append("NA")

应用于列表的 Python 正则表达式

问题描述

1 个解决方案

解决方案1
0 2018-02-24 20:26:19

应用于列表的 Python 正则表达式

问题描述

1 个解决方案

解决方案1 0 2018-02-24 20:26:19

解决方案1
0 2018-02-24 20:26:19