简体   繁体   English

使用re.findall从一行中提取数据

[英]Using re.findall to extract data from a line

I am trying (and failing so far) to extract time and two measurement data from a text line (read from a file) 我正在尝试(并且到目前为止失败)从文本行中提取时间和两个测量数据(从文件中读取)

The lines have following format 这些行具有以下格式

"2013-08-07-21-25   26.0   1015.81"

I tried (among other): 我试过(除此之外):

>>> re.findall(r"([0-9,-]+)|(\d+.\d+)", "2013-08-07-21-25   26.0   1015.81")
[('2013-08-07-21-25', ''), ('26', ''), ('0', ''), ('1015', ''), ('81', '')]

And only got entertaining (but not desired) results. 并且只获得了有趣(但不是理想的)结果。

I would like to find a solution like this: 我想找到这样的解决方案:

date, temp, press = re.findall(r"The_right_stuff", "2013-08-07-21-25   26.0   1015.81")
print date + '\n' + temp + '\n' + press + '\n'
2013-08-07-21-25
26.0
1015.81

Even better if the assignment could be stuck into a test to check if the number of matches is correct. 如果分配可能会陷入测试以检查匹配数是否正确,那就更好了。

if len(date, temp, press = re.findall(r"The_rigth_stuff", "2013-08-07-21-25   26.0   1015.81")) == 3:
    print 'Got good data.'
    print date + '\n' + temp + '\n' + press + '\n'

The lines have be transmitted via serial connection and might have bad (ie unexpected) characters interspersed. 这些行通过串行连接传输,并且可能散布有坏(即意外)字符。 So it does not work to separate by string index. 所以它不能通过字符串索引分开。

See Prevent datetime.strptime from exit in case of format mismatch . 如果格式不匹配,请参阅防止datetime.strptime退出


Edit @hjpotter92 编辑@ hjpotter92

I mentioned there were corrupted lines from the serial transmission. 我提到串行传输中存在损坏的线路。 The below example failed the split solution. 以下示例未通过拆分解决方案。

2013-08-1q-07-15   23.8   1014.92
2013-08-11-07-20   23.8   101$96
6113-p8-11-0-25   23.8   1015*04

Assigning the list of measurements into a numpy array failed. 将测量列表分配到numpy阵列失败。

>>> p_arr= np.asfarray(p_list, dtype='float')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 105, in asfarray
    return asarray(a, dtype=dtype)
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
        return array(a, dtype, copy=False, order=order)
    ValueError: invalid literal for float(): 101$96

I put the set of data here . 我把这组数据放在这里

Use a re.split since the data is separated by horizontal-space characters: 使用re.split因为数据由水平空格字符分隔:

date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")

>>> import re
>>> date, temp, press = re.split('\s+', "2013-08-07-21-25   26.0   1015.81")
>>> print date
2013-08-07-21-25
>>> print temp
26.0
>>> print press
1015.81
print [i+j for i,j in re.findall(r"\b(\d+(?!\.)(?:[,-]\d+)*)\b|\b(\d+\.\d+)\b", "2013-08-07-21-25   26.0   1015.81")]

You have to prevent first group from taking anything away from what is meant from the second group. 你必须防止第一组从第二组中取出任何东西。

Output: ['2013-08-07-21-25', '26.0', '1015.81'] 输出: ['2013-08-07-21-25', '26.0', '1015.81']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM