簡體   English   中英

如何通過過濾和排序“重復”值來比較兩個列表

[英]How to compare two lists by filtering and sorting "repeated" values

對於 email 活動,我有以下 act2.txt 文件:

2021-04-02//email@example.com//Enhance your presentation skills in 15 minutes//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-16//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
2021-04-01//email@example.com//Enhance your presentation skills in 15 minutes//Delivered
2021-04-09//email@example.com//we are here to help you improve your skills//Delivered
2021-04-12//email@example.com//(1st meeting) here is our recorded presentation skills webinar//Delivered
2021-04-13//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered

我想跟蹤客戶的 email 活動 - 我計算了已發送的電子郵件、已發送的電子郵件然后打開率。

我生成了兩個列表,一個用於發送的電子郵件,另一個用於打開的電子郵件:

import re
from pprint import pprint

#read the file with activities separated by //
afile = "act2.txt"
afile_read = open(afile,"r")
lines = afile_read.readlines()

activityList = []
for activities in lines:
            activity = activities.split("//")
            date = activity[0]
            customer_email = activity[1]
            email_title = activity[2]
            action = activity[3]
            stripped_line = [s.rstrip() for s in activity]
            activityList.append(stripped_line)

#print (activityList)


stripped_email = 'email@example.com'
email_actions = [x for x in activityList if stripped_email in x[1]]
delivered = [x for x in email_actions if 'Delivered' in x]
Opened = [x for x in email_actions if 'Open' in x]
delcount = (len(delivered))
opencount = (len(Opened))
try:
  Open_rate =  opencount / delcount * 100
except ZeroDivisionError:
  Open_rate = 0
print (stripped_email,",", delcount,",", opencount,",", Open_rate,"%")

pprint(delivered)
pprint (Opened)

發貨清單:

[['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-16',
  'email@example.com',
  'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
  'Delivered'],
 ['2021-04-01',
  'email@example.com',
  'Enhance your presentation skills in 15 minutes',
  'Delivered'],
 ['2021-04-09',
  'email@example.com',
  'we are here to help you improve your skills',
  'Delivered'],
 ['2021-04-12',
  'email@example.com',
  '(1st meeting) here is our recorded presentation skills webinar',
  'Delivered'],
 ['2021-04-13',
  'email@example.com',
  'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
  'Delivered']]

開放名單:

[['2021-04-02',
  'email@example.com',
  'Enhance your presentation skills in 15 minutes',
  'Open'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Open']]

我想比較兩個列表並生成第三個列表(組合活動),由 email 主題過濾 - 如果主題在交付列表和打開列表中,那么它將被計為一個活動。 但是, email 主題可以重復,就像 email 交付了 3 次但只打開了一次。 我找不到正確的邏輯,因為我仍在學習 python。

編輯更清楚:

如果在按標題過濾的打開列表中找到 email,則應在最后日期之前從交付列表中刪除相同的標題,並生成包含組合活動的新列表。

您需要以不同的方式考慮這一點,而不是組合列表。

如果打開了 email,則表示它也已收到。 這意味着您打開的列表也是您的組合列表。

在您意識到這一點之后,您所要做的就是將未打開的電子郵件復制到未打開的電子郵件的結果列表中。

Go 覆蓋打開的電子郵件列表並將主題復制到一個集合中,然后 go 覆蓋接收到的電子郵件並檢查主題是否在集合中,如果是則什么也不做。 如果主題不在集合中,則將其復制到未打開的電子郵件列表中。

這是一段非常簡單的代碼:

opened_subjects = set()
unopened = []
for email in opened:
    opened_subjects.add(email[2])

unopened_subjects = set()
for email in received:
    if all(email[2] not in subj_set 
           for subj_set in (opened_subjects, unopened_subjects)):
        unopened.append(email)
        unopened_subjects.add(email[2])

print('Both received and opened:', opened)
print('Unopened emails:', unopened)

一個小筆記——
每組的原因都不一樣。 第一組opened_subjects之所以存在,是因為該set能夠僅包含唯一項目,而這正是這種情況下所需要的。 第二組unopened_subjects在那里,因為檢查一個項目是否在一個集合中比在一個列表中更快,因為我在以任何方式添加到集合之前進行檢查,因此不需要集合僅存儲唯一的能力。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM