简体   繁体   中英

Regex Substitution in Python

I have a CSV file with several entries, and each entry has 2 unix timestamp formatted dates.

I have a method called convert() , which takes in the timestamp and converts it to YYYYMMDD .

Now, since I have 2 timestamps in each line, how would I replace each one with the new value?

EDIT: Just to clarify, I would like to convert each occurrence of the timestamp into the YYYYMMDD format. This is what is bugging me, as re.findall() returns a list.

If you know the replacement:

p = re.compile( r',\d{8},')
p.sub( ','+someval+',', csvstring )

if it's a format change:

p = re.compile( r',(\d{4})(\d\d)(\d\d),')
p.sub( r',\3-\2-\1,', csvstring )

EDIT: sorry, just realised you said python, modified above

I assume that by "unix timestamp formatted date" you mean a number of seconds since the epoch. This assumes that every number in the file is a UNIX timestamp. If that isn't the case you'll need to adjust the regex:

import re, sys

# your convert function goes here

regex = re.compile(r'(\d+)')
for line in sys.stdin:
  sys.stdout.write(regex.sub(lambda m:
  convert(int(m.group(1))), line))

This reads from stdin and calls convert on each number found.

The "trick" here is that re.sub can take a function that transforms from a match object into a string. I'm assuming your convert function expects an int and returns a string, so I've used a lambda as an adapter function to grab the first group of the match, convert it to an int, and then pass that resulting int to convert.

Not able to comment your question, but did you take a look at the CSV module of python? http://docs.python.org/library/csv.html#module-csv

I'd use something along these lines. A lot like Laurence's response but with the timestamp conversion that you requested and takes the filename as a param. This code assumes you are working with recent dates (after 9/9/2001). If you need earlier dates, lower 10 to 9 or less.

import re, sys, time

regex = re.compile(r'(\d{10,})')

def convert(unixtime):
  return time.strftime("%Y%m%d", time.gmtime(unixtime))

for line in open(sys.argv[1]):
  sys.stdout.write(regex.sub(lambda m: convert(int(m.group(0))), line))

EDIT: Cleaned up the code.

Sample Input

foo,1234567890,bar,1243310263
cat,1243310263,pants,1234567890
baz,987654321,raz,1

Output

foo,20090213,bar,20090526
cat,20090526,pants,20090213
baz,987654321,raz,1 # not converted (too short to be a recent)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM