I have a text/csv file that contains , amongst others, rows that look like this:
05:21:20PM Driving 46 84.0 Some Road; Some Ext 1; in SomePLace; Long 38 12 40.6 E Lat 29 2 47.2 S
There are other rows containing data that I am not after.
I am only looking to extract the timestamp, and then the LatLong .
The only thing constant in the rows I am interested in is the timstamp at the beginning, that is always 8 characters long and ends with PM or AM, and then the Lat/Long that starts with the word "Long" and ends in an "S".
Is there any way that I can run through this file and only strip out these two peices of text, concatenate them into a new row, and ignoring all other rows that does not have the timestamp as first entry AND the Lat/Long part at the end ( some rows have a timestamp in beginning but not the lat/long)
Use the csv
module to parse out the rows, then split the last column on ;
to get the lat/long coordinates:
with open(inputfilename, 'rb') as inputfh:
reader = csv.reader(inputfh, delimiter='\t')
for row in reader:
timestamp = row[0]
lat_long = row[2].rpartition(';')[-1].strip()
This assumes that the file is tab-separated and that the latitute/longitude entry is always the last ;
semi-colon separated value in the 3rd column
I do not recommend using regular expressions if your data is in CSV format because this is not going to be pretty and regular expressions are the wrong tool for CSV . But because your data does not look like a true CSV format, parsing it using regular expressions might be an option and this code would work for the sample you have provided:
import re
with open('inputfilename', 'rU') as f:
for line in f:
mat = re.match("(\d+):(\d+):(\d+)([AP]M).*Long\s+([^EW]+[EW]).*Lat\s+([^NS]+[NS])", line)
if mat is not None:
print mat.groups()
result:
('05', '21', '20', 'PM', '38 12 40.6 E', '29 2 47.2 S')
Further processing of this result is left as an exercise, but it could look like this:
hour, minute, second, am_pm, long, lat = mat.groups()
>>> s = "05:21:20PM Driving 46 84.0 Some Road; Some Ext 1; in SomePLace; Long 38 12 40.6 E Lat 29 2 47.2 S"
>>> date = s.split(" ")[0]
>>> date
'05:21:20PM'
>>> long_start = "Long"
>>> lat_start = "Lat"
>>> longtitude = s[s.find(long_start) + len(long_start): s.find(lat_start)]
>>> longtitude
' 38 12 40.6 E '
>>> latitude = s[s.find(lat_start) + len(lat_start):]
>>>
>>> latitude
' 29 2 47.2 S'
>>> latitude = s[s.find(lat_start) + len(lat_start):].strip()
>>> latitude
'29 2 47.2 S'
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.