简体   繁体   中英

Can anyone help me DRY this REGEX?

This is my first question (although I have found many perfect solutions to questions on Stack Overflow in the past - it is my first source for help).

I have text strings that contain a month and a series of date. Sometimes, there are two months in the string.

date1 = "January 9, 10, 15, 16, 17, 18, 22, 23, 24"
date2 = "September 19, 20, 25, 26, 27, 28, October 2, 3, 4, 10, 11"

I wrote a very WET piece of code that pulls the month from the string and adds each date, plus the year. However, there are several issues I just can't figure out.

  1. ITERATING THROUGH THE DATES: I know I should use the EACH method to iterate through the dates. I tried but I can't get that to work, so I am doing it the hard way by concatenating the month with each date element. The obvious problem with this is, I don't know how many dates there will be so I have to build to the longest string and use an IF Statement to determine if I've reached the end of the string. I should use dates1.length = x plus DO EACH, but I can't get it to work.

  2. CONCATENATING MONTH DAY YEAR: My very bad wet code works as far as pulling the month day and year together, but how do I get rid of the brackets and quotes?

  3. MULTIPLE MONTHS: How do I choose the second month in the string, and concatenate ONLY the individual dates that follow the month name to get MONTH/DD/YY?

Here is a sample of my very bad code.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

date1 = "January 9, 10, 15, 16, 17, 18, 22, 23, 24"
date2 = "September 19, 20, 25, 26, 27, 28, October 2, 3, 4, 10, 11"
datetext = date1.scan(/([\w\-]+)/)     #=> pulls the whole string 
datetext2 = date1.scan(/(\w*)\s?/)[0]  #=> this pulls the month
datenumbers = date1.scan(/(\d+)/)
firstdate = datenumbers[0]             #=> the first date following the first month
seconddate = datenumbers[1]
year = "2014"

mdy1 = "#{datetext2} #{firstdate} #{year}"
mdy2 = "#{datetext2} #{seconddate} #{year}"

puts date1
puts " "
puts datetext2 #=> this variable adds the [0] delimiter to pull the 1st month
puts firstdate
puts " "
puts mdy1
puts mdy2
puts " "

I suggest you do the following.

Code

def extract_dates_by_month(str)
  str.scan(/[A-Z][a-z]+|\d+/).each_with_object([]) { |e,b|
    e[0][/[A-Z]/] ? b << [e,[]] : b.last.last << e }
end

Example

str = "September 19, 20, 25, 26, October 2, 3, 4, 10, November 3, 12, 17"
extract_dates_by_month(str)
  #=> [["September", ["19", "20", "25", "26"]],
  #    ["October", ["2", "3", "4", "10"]],
  #    ["November", ["3", "12", "17"]]]

Explanation

The first step is extracting the month names and days:

a = str.scan(/[A-Z][a-z]+|\d+/)
  #=> ["September", "19", "20", "25", "26", "October", "2", "3", "4", "10",
  #    "November", "3", "12", "17"]

We then divide this array up into months:

a.each_with_object([]) { |e,b| e[0][/[A-Z]/] ? b << [e,[]] : b.last.last << e }
  #=> [["September", ["19", "20", "25", "26"]],
  #    ["October", ["2", "3", "4", "10"]],
  #    ["November", ["3", "12", "17"]]]

Enumerable#each_with_object creates an initially-empty array for the block variable b and that array is returned by the method. Each element of a is passed into the block and referenced by the block variable e . The following operations are performed:

b = []
e = "September"
e[0][/[A-Z]/] #=> "S"
b << [e,[]]   #=> [["September", []]]

e = "19"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19"]]]

e = "20"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20"]]]

e = "25"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20", "25"]]]

e = "26"
e[0][/[A-Z]/] #=> nil
b.last.last << e
b             #=> [["September", ["19", "20", "25", "26"]]]

e = "October"
e[0][/[A-Z]/] #=> "O"
b << [e,[]]   #=> [["September", ["19", "20", "25", "26"]], ["October", []]]

and so on.

If you want the days to be integers, change:

b.last.last << e

to:

b.last.last << e.to_i

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM