简体   繁体   中英

How to match a list of dates to a pattern?

I have a Python list of tuples with three objects: a string (ex: title), date and another string (ex: a name).

Example:

scientific_works = [
    ('SW 1', datetime.date(2000, 10, 15), 'auth 1'),
    ('SW 2', datetime.date(2000, 11, 3), 'auth 1'),
    ('SW 3', datetime.date(2000, 11, 4), 'auth 1'),
    ('SW 4', datetime.date(2000, 12, 1), 'auth 1'),
]

Then I have a pattern:

from date until date , (at least) int items from list per int days/weeks/months/years

Example:

from  datetime.date(2000, 11, 1)
until datetime.date(2000, 11, 30)
1 item per day

What I would like the algorithm to do:

  • Given that list and that pattern, do the filtered items match the rules?

In the case of examples, this pattern would match 2 items, all of them matching the rule here: 1 item complete per day , however, since there aren't an item for each day block, the algorithm would return false .

Another example:

  • Is there (at least) Int_1 amount of items (works) per Int_2 (day/week/month)?
    • 1 work per day would mean at least 1 item per 1 day block of given the date range. 2 works per week would mean, at least 2 works each week (or 7 day block) of date range.

I can iterate over the list and find out which items match from and until pattern, of course.

However, I am really confused over matching them with the rest of the rules to see if its a positive or a negative match.

My question:

  • How can I construct an algorithm, provide it with a list and a pattern of rules (x items per y day OR week OR month OR year), and see if it matches or not?

I am working on a little component for an application where given a certain data (list) and rules (pattern), if an author unlocks a reward or not.

I have completed udacity's several Python classes, including most of algorithms but really can not find my way around this.

So far I thought of this:

  1. Filter list items with the given date range.
  2. Calculate the range blocks within the range: 1 day from d1 until d2 = 5 days - 1 week from d1 until d2 = 3 weeks
  3. Create a loop in range of int calculated above.
  4. Convert weeks, months, years to days in each step of the loop.
  5. Add the amount to the start date and see if items match the date range.
  6. Add the amount to next start of date range and repeat.

However, this doesn't work and I don't think converting blocks to days is efficient at all.

Thank you.

Can you post a better example of the rules that the match has to follow? Are you looking for a certain number of items per author per time period? Or are you looking for certain entries over a time period and then finding who they belong to? That will effect the sort.

I think you will end up having to use a sort algorithm on this data, which is not horrible if you go about it the right way.

From the bottom part of your question I think that if you are searching for x items per n time-periods (day/week/month) and then determining the authors it might be a bit messy. If you have a finite number of authors it might be easier to flip that around and create an array for each author and store the item and date in there. Then you just run a testing loop over each author that checks all their entries to see if they fit the requirements.

For Python classes, MIT OpenCourseware's 6.00 Introduction to Computer Science and Programming is very good. It can be found at http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00-introduction-to-computer-science-and-programming-fall-2008/ >

I would use following design: main generator function which iterates over sequence of works and yields "good" ones; and a set of pluggable filters which implement particular rules, such as date range, N items per day, per week, per month, etc.

Following is an small example to illustrate the idea:

from datetime import date
from pprint import pprint

scientific_works = [
    ('SW 1', date(2000, 10, 15), 'auth 1'),
    ('SW 2', date(2000, 11, 3), 'auth 1'),
    ('SW 3', date(2000, 11, 4), 'auth 1'),
    ('SW 4', date(2000, 11, 5), 'auth 1'),
    ('SW 5', date(2000, 12, 1), 'auth 1'),
    ('SW 6', date(2000, 12, 15), 'auth 1'),
    ('SW 7', date(2000, 12, 18), 'auth 1'),
    ('SW 8', date(2000, 12, 22), 'auth 1'),
]

def filter_works(works, *filters):
    for work in works:
        good = True
        for fil in filters:
            good = good and fil(work)
        if good:
            yield work

class RangeFilter(object):
    def __init__(self, from_date, to_date):
        self.from_date = from_date
        self.to_date = to_date

    def __call__(self, work):
        return self.from_date <= work[1] <= self.to_date


class WorksPerMonthFilter(object):
    def __init__(self, limit):
        self.limit = limit
        self._current_month = date.min
        self._current_number = 0

    def __call__(self, work):
        month = date(work[1].year, work[1].month, 1)
        if month == self._current_month:
            self._current_number += 1
        else:
            self._current_month = month
            self._current_number = 1
        return self._current_number <= self.limit


if __name__ == '__main__':
    pprint(list(filter_works(scientific_works, RangeFilter(date(2000, 10, 1), date(2000, 11, 30)), WorksPerMonthFilter(2))))
    pprint(list(filter_works(scientific_works, RangeFilter(date(2000, 10, 1), date(2000, 12, 31)), WorksPerMonthFilter(2))))
    pprint(list(filter_works(scientific_works, RangeFilter(date(2000, 10, 1), date(2000, 12, 31)), WorksPerMonthFilter(3))))

If the pattern is:

from  start_date
until end_date
X items per period

then to find out whether scientific_works matches the pattern, an analog of numpy.histogram() function could be used:

import datetime
import numpy as np

ts = datetime.date.toordinal # or any monotonic numeric `date` function 
hist = np.histogram(map(ts, (date for title, date, name in scientific_works)),
                    bins=map(ts, daterange(start_date, end_date, period))[0]
does_it_match = all(x >= X for x in hist)

where:

def daterange(start_date, end_date, period):
    d = start_date
    while d < end_date:
        yield d
        d += period

Example:

>>> from datetime import date, timedelta
>>> list(daterange(date(2000, 1, 1), date(2000, 2, 1), timedelta(days=7)))
[datetime.date(2000, 1, 1), datetime.date(2000, 1, 8),
 datetime.date(2000, 1, 15), datetime.date(2000, 1, 22),
 datetime.date(2000, 1, 29)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM