Regex to split line data into year / temperature readings

Question

I'm writing a Python script to parse some data files I have into geojson data.

Right now, I have a number of lines that each start with a year and then have 12 temperature readings (one for each month) for example:

1983   5.2  -0.4   5.7   9.8  13.7  18.1  22.1  19.8  15.1  10.2   4.8   1.1 
1984   1.9   0.5   2.8   8.9  13.7  15.0  16.9  19.2  13.5  11.3   4.6   0.7 
1985  -5.0  -2.8   4.0   8.8  15.6  15.2  19.0  18.4  14.3   9.9   2.0   4.4 
1986   0.4  -6.4   3.8   7.4  15.9  17.4  19.4  18.2  12.3  10.3   7.1   2.5

Etc. I'm trying to write a regex ideally so that the year will go into the first capture group and then either all the temperatures will go into the next group, or they will go into individual groups. In the first situation, I'll just split based on spaces and then parse them individually. In the second, I'll just parse each capture group one by one.

I've tried this right now and it's not working (scaled down example to demonstrate):

import re
reYear = re.compile("([0-9][0-9][0-9][0-9])([\s]*[\-]*[0-9]+[\s]*)*")
line = "1983   5.2  -0.4   5.7   9.8  13.7  18.1  22.1  19.8  15.1  10.2   4.8   1.1"
data = reYear.search(line)
print("GROUP 0: %s" % data.group(0))
print("GROUP 1: %s" % data.group(1))

This is the output I get:

GROUP 0: 1983   5
GROUP 1: 1983

I thought this might work because the first () group says capture 4 digits, and the second says capture some instances of either a minus sign (or not), some numbers, and then some whitespace. However I don't really know what I'm doing. Appreciate any help.

Thank you!

Answer 1

I suggest using .* for matching the remainder of the line. Also, \\d{4} is the simplest way to match four digits:

import re

# Regex: (four digits) whitespace (the rest of the line)
reYear = re.compile("(\d{4})\s+(.*)")
line = "1983   5.2  -0.4   5.7   9.8  13.7  18.1  22.1  19.8  15.1  10.2   4.8   1.1"
data = reYear.search(line)

# Group 0 is everything
print("GROUP 0: %s" % data.group(0))

print("GROUP 1: %s" % data.group(1))
print("GROUP 2: %s" % data.group(2))

This outputs:

GROUP 0: 1983   5.2  -0.4   5.7   9.8  13.7  18.1  22.1  19.8  15.1  10.2   4.8   1.1
GROUP 1: 1983
GROUP 2: 5.2  -0.4   5.7   9.8  13.7  18.1  22.1  19.8  15.1  10.2   4.8   1.1

Having said all that, you could just split the whole line on whitespace and take the first element as the year, and not use a regex at all.

Regex to split line data into year / temperature readings

Question

1 answers

solution1
2 ACCPTED 2016-06-19 17:50:45

Regex to split line data into year / temperature readings

Question

1 answers

solution1 2 ACCPTED 2016-06-19 17:50:45

solution1
2 ACCPTED 2016-06-19 17:50:45