简体   繁体   English

如何从pandas中的一系列字符串中提取小时和分钟

[英]How to extract hours and minutes from a series of strings in pandas

I have been stuck on this seemingly simple problem for hours.几个小时以来,我一直被这个看似简单的问题所困扰。 I would like to convert the following strings to minutes.我想将以下字符串转换为分钟。 (Or hours and minutes if I could). (如果可以的话,或者小时和分钟)。

foo['stringtime'] = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minutes'])

#What I've tried:
foo['stringtime'] = foo['stringtime'].str.replace(r'hours?','').str.replace(' minutes','').str.split(' and ')

However this would create a situation where '2 hours' and '38 minutes' become ['2'] and ['38']但是,这会造成'2 hours'和“ '38 minutes'变为['2']['38']情况

#What I would like to happen:
foo.head()
output:
119
120
NaN (or 0)
38
271

Is there any beautiful elegant pythonic way to do this?有什么漂亮优雅的 pythonic 方法可以做到这一点吗?

Try Using Regex.尝试使用正则表达式。

Ex:前任:

import re

def p_time(val):
    try:
        t = 0
        h = re.search(r"(\d+) hour(s)?", val)
        if h:
            t += int(h.group(1)) * 60
        m = re.search(r"(\d+) minute(s)?", val)
        if m:
            t += int(m.group(1))
        return t
    except:
        pass
    return 0

s = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minute'])
print(s.apply(p_time).astype(int))

Output: Output:

0    119
1    120
2      0
3     38
4    271
dtype: int32

Another way might be just to use numexpr to evaluate a numerical equation:另一种方法可能只是使用numexpr来评估数值方程:

import numexpr

foo = pd.Series(['1 hour and 59 minutes','2 hours', np.nan, '38 minutes', '4 hours and 31 minutes'])

(foo.str.replace(r' hours?','*60').str.replace(' minutes','').str.replace(' and ', '+')
    .fillna('0').apply(numexpr.evaluate))

Output: Output:

0    119
1    120
2      0
3     38
4    271

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM