简体   繁体   中英

Filter date in datetime objects by month

I have a dictionary where the keys are datetime.datetime & the values are lists of tweets. So it looks like this:

{datetime.datetime(2017, 9, 30, 19, 55, 20) : ['this is some tweet text'],
 datetime.datetime(2017, 9, 30, 19, 55, 20) : ['this is another tweet']...

I'm trying to get the number of tweets sent out each month of the year. So far I have...

startDate = 10
endDate= 11
start = True
while start:

    for k,v in tweetDict.items():
        endDate-=1
        startDate-=1

        datetimeStart = datetime(2017, startDate, 1)
        datetimeEnd = datetime(2017,endDate, 1)

        print(datetimeStart, datetimeEnd)

        if datetimeStart < k < datetimeEnd:
            print(v)
        if endDate == 2:
            start = False
            break

which only prints (I'm aware of the print statement)...

2017-08-01 00:00:00 2017-09-01 00:00:00
2017-07-01 00:00:00 2017-08-01 00:00:00
2017-06-01 00:00:00 2017-07-01 00:00:00
2017-05-01 00:00:00 2017-06-01 00:00:00
2017-04-01 00:00:00 2017-05-01 00:00:00
2017-03-01 00:00:00 2017-04-01 00:00:00
2017-02-01 00:00:00 2017-03-01 00:00:00
2017-01-01 00:00:00 2017-02-01 00:00:00

and not the actual tweets themselves. I was expecting something like ...

2017-08-01 00:00:00 2017-09-01 00:00:00
['heres a tweet']
['theres a tweet']
2017-07-01 00:00:00 2017-08-01 00:00:00
['there only 1 tweet for this month']....

I'm kinda stuck, how can I achieve this?

You can just group by the month instead of trying to subtract/compare different months:

>>> d = {datetime.datetime(2017, 9, 30, 19, 55, 20): ['this is some tweet text'],
         datetime.datetime(2017, 9, 30, 20, 55, 20): ['this is another tweet'],
         datetime.datetime(2017, 10, 30, 19, 55, 20): ['this is an october tweet'],}
>>> from itertools import groupby
>>> for month, group in groupby(d.items(), lambda (k, v): k.month):
...     print(month)
...     for dt, tweet in group:
...         print(dt, tweet)
...         
10
2017-10-30 19:55:20 ['this is an october tweet']
9
2017-09-30 19:55:20 ['this is some tweet text']
2017-09-30 20:55:20 ['this is another tweet']
>>> 

And of course, you can print it in a nicer format and so on (inner join is needed because each key seems to be a list):

>>> for month, group in groupby(d.items(), lambda (k, v): k.month):
...     tweets = list(group)
...     print("%d tweet(s) in month %d" % (len(tweets), month))
...     print('\n'.join(','.join(tweet) for (dt, tweet) in tweets))
...     
1 tweet(s) in month 10
this is an october tweet
2 tweet(s) in month 9
this is some tweet text
this is another tweet
>>> 

First thing: you're putting two items in your dict with the exact same key. The second one will overwrite the first. For the rest of this, I'm going to assume that the second item in your example is slightly different ( seconds=21 ).

The reason your code isn't working as you expect is because you're decrementing endDate and startDate inside your for loop. As a result, you're only checking each date against a single item in the dict; if that item happens to land in that month, it gets printed. If not, it doesn't. To illustrate, here's what you get if you change your print to print(datetimeStart, datetimeEnd, k, v) :

2017-09-01 00:00:00 2017-10-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text']
['this is some tweet text']
2017-08-01 00:00:00 2017-09-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet']
2017-07-01 00:00:00 2017-08-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text']
2017-06-01 00:00:00 2017-07-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet']
2017-05-01 00:00:00 2017-06-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text']
2017-04-01 00:00:00 2017-05-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet']
2017-03-01 00:00:00 2017-04-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text']
2017-02-01 00:00:00 2017-03-01 00:00:00 2017-09-30 19:55:21 ['this is another tweet']
2017-01-01 00:00:00 2017-02-01 00:00:00 2017-09-30 19:55:20 ['this is some tweet text']

The fix with the least change to your existing code would be to simply move the decrements in front of the for loop and dedent the if endDate... block to the level of the while loop:

while start:
    endDate-=1
    startDate-=1
    for k,v in tweetDict.items():
        datetimeStart = datetime(2017, startDate, 1)
        datetimeEnd = datetime(2017,endDate, 1)
        print(datetimeStart, datetimeEnd, k, v)
        if datetimeStart < k < datetimeEnd:
            print(v)
    if endDate == 2:
        start = False
        break

Of course, at that point you might as well just get rid of the if endDate... block and do while endDate > 2: .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM