For some reasons, I cannot write anything to a blank text file. I used file.close() in the end, but it still doesn't work at all. So could anyone point out where I might get wrong?
Below is the full code, and basically what I am doing is to retrieve unique email addresses from a text file, and then match these unique emails with unique five digits numbers, and finally write to a new file replacing emails by these numbers.
import re
import random
email_list = []
anon = {}
number_list = []
##There are 54 unique emails, so I set len(number_list) = 54 here
while len(number_list) < 54:
rand = random.randint(10000,99999)
rand = '%%' + str(rand) + '%%'
if rand not in number_list:
number_list.append(rand)
i = 0
a = open('mbox.txt','r')
for line in a:
if re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line):
email = re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line)[0]
if email not in email_list:
email_list.append(email)
anon[email] = number_list[i]
i += 1
else:
email = "NA"
b = open('mbox-anon.txt','wt', encoding='utf-8')
for line in a:
for email in anon:
try:
linereplace = line.replace(email,anon[email])
b.write(linereplace)
except:
pass
a.close()
b.close()
Assuming your intention was to replace the first file contents and put them into the second file, you should replace for line in b
with
a.seek(0)
for line in a:
Or open b
before the first loop, and add
b.write(line.replace(email, anon[email]))
for every iteration.
I think this code does what you're after. It reads a file mbox.txt
, extracts all emails from it and maps each unique email address to a 5-digit value following your approach. It then writes the same data to mbox-anon.txt
, substituting each email address for its corresponding 5-digit value.
import random
import re
def generate_crypto_value():
return '%%{}%%'.format(random.randint(10000, 99999))
def obscure_emails(file_in, file_out, email_masker):
with open(file_in) as f_in, open(file_out, 'w') as f_out:
data = f_in.read()
email_pattern = r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+'
for email in set(re.findall(email_pattern, data)):
data = data.replace(email, email_masker())
f_out.write(data)
if __name__ == '__main__':
obscure_emails(
file_in='mbox.txt',
file_out='mbox-anon.txt',
email_masker=generate_crypto_value)
Example mbox.txt
before run
Here's one address: foo.bar@email.com
Another address: baz@hotmail.org
And the first address again: foo.bar@email.com with some text after it
Example mbox-anon.txt
after run
Here's one address: %%61286%%
Another address: %%51955%%
And the first address again: %%61286%% with some text after it
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.