使用Python按子串排序字符串列表

Question

I have a list of strings, each of which is an email formatted in almost exactly the same way. 我有一个字符串列表，每个字符串都是以几乎完全相同的方式格式化的电子邮件。 There is a lot of information in each email, but the most important info is the name of a facility, and an incident date. 每封电子邮件中都有大量信息，但最重要的信息是设施名称和事件日期。

I'd like to be able to take that list of emails, and create a new list where the emails are grouped together based on the "location_substring" and then sorted again for the "incident_date_substring" so that all of the emails from one location will be grouped together in the list in chronological order. 我希望能够获取该电子邮件列表，并创建一个新列表，其中电子邮件根据“location_substring”组合在一起，然后再次为“incident_date_substring”排序，以便来自一个位置的所有电子邮件将按时间顺序组合在列表中。

The facility substring can be found usually in the subject line of each email. 设施子字符串通常可以在每封电子邮件的主题行中找到。 The incident date can be found in a line in the email that starts with: "Date of Incident:". 事件日期可以在电子邮件中的一行中找到，该行以“事件发生日期：”开头。

Any ideas as to how I'd go about doing this? 关于我如何做这个的任何想法？

Answer 1

Write a function that returns the two pieces of information you care about from each email: 编写一个函数，从每封电子邮件中返回您关注的两条信息：

def email_sort_key(email):
    """Find two pieces of info in the email, and return them as a tuple."""
    # ...search, search...
    return "location", "incident_date"

Then, use that function as the key for sorting: 然后，使用该函数作为排序的关键：

emails.sort(key=email_sort_key)

The sort key function is applied to all the values, and the values are re-ordered based on the values returned from the key function. 排序键功能应用于所有值，并根据键功能返回的值重新排序值。 In this case, the key function returns a tuple. 在这种情况下，键函数返回一个元组。 Tuples are ordered lexicographically: find the first unequal element, then the tuples compare as the unequal elements compare. 按字典顺序排列元组：找到第一个不相等的元素，然后将元组进行比较，因为不相等的元素进行比较。

Answer 2

Your solution might look something like this: 您的解决方案可能如下所示：

def getLocation (mail): pass
    #magic happens here

def getDate (mail): pass
    #here be dragons

emails = [...] #original list

#Group mails by location
d = {}
for mail in emails:
    loc = getLocation (mail)
    if loc not in d: d [loc] = []
    d [loc].append (mail)

#Sort mails inside each group by date
for k, v in d.items ():
    d [k] = sorted (v, key = getDate)

Answer 3

This is something you could do: 这是你可以做的事情：

from collections import defaultdict
from datetime import datetime
import re

mails = ['list', 'of', 'emails']

mails2 = defaultdict(list)

for mail in mails:
    loc = re.search(r'Subject:.*?for\s(.+?)\n', mail).group(1)
    mails2[loc].append(mail)

for m in mails2.values():
    m.sort(key=lambda x:datetime.strptime(re.search(r'Date of Incident:\s(.+?)\n',
                                                    x).group(1), '%m/%d/%Y'))

Please note that this has absolutely no error handling for cases where the regexes don't match. 请注意，对于正则表达式不匹配的情况，这绝对没有错误处理。

使用Python按子串排序字符串列表

问题描述

3 个解决方案

解决方案1
4 已采纳 2012-12-08 18:27:34

解决方案2
0 2012-12-08 18:27:09

解决方案3
0 2012-12-08 20:19:03

使用Python按子串排序字符串列表

问题描述

3 个解决方案

解决方案1 4 已采纳 2012-12-08 18:27:34

解决方案2 0 2012-12-08 18:27:09

解决方案3 0 2012-12-08 20:19:03

解决方案1
4 已采纳 2012-12-08 18:27:34

解决方案2
0 2012-12-08 18:27:09

解决方案3
0 2012-12-08 20:19:03