简体   繁体   中英

python write doesn't work

For some reasons, I cannot write anything to a blank text file. I used file.close() in the end, but it still doesn't work at all. So could anyone point out where I might get wrong?

Below is the full code, and basically what I am doing is to retrieve unique email addresses from a text file, and then match these unique emails with unique five digits numbers, and finally write to a new file replacing emails by these numbers.

import re
import random


email_list = []
anon = {}
number_list = []

##There are 54 unique emails, so I set len(number_list) = 54 here
while len(number_list) < 54:
    rand = random.randint(10000,99999)
    rand = '%%' + str(rand) + '%%'
    if rand not in number_list:
        number_list.append(rand)

i = 0

a = open('mbox.txt','r')
for line in a:
    if re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line):
        email = re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line)[0]
        if email not in email_list:
            email_list.append(email)
            anon[email] = number_list[i]
            i += 1
    else:
            email = "NA"


b = open('mbox-anon.txt','wt', encoding='utf-8')

for line in a:
    for email in anon:
        try:
            linereplace = line.replace(email,anon[email])
            b.write(linereplace)
        except:
            pass
a.close()
b.close() 

Assuming your intention was to replace the first file contents and put them into the second file, you should replace for line in b with

a.seek(0)
for line in a:

Or open b before the first loop, and add

b.write(line.replace(email, anon[email]))

for every iteration.

I think this code does what you're after. It reads a file mbox.txt , extracts all emails from it and maps each unique email address to a 5-digit value following your approach. It then writes the same data to mbox-anon.txt , substituting each email address for its corresponding 5-digit value.

import random
import re


def generate_crypto_value():
    return '%%{}%%'.format(random.randint(10000, 99999))


def obscure_emails(file_in, file_out, email_masker):
    with open(file_in) as f_in, open(file_out, 'w') as f_out:
        data = f_in.read()

        email_pattern = r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+'
        for email in set(re.findall(email_pattern, data)):
            data = data.replace(email, email_masker())

        f_out.write(data)


if __name__ == '__main__':
    obscure_emails(
        file_in='mbox.txt',
        file_out='mbox-anon.txt',
        email_masker=generate_crypto_value)

Example mbox.txt before run

Here's one address: foo.bar@email.com
Another address: baz@hotmail.org
And the first address again: foo.bar@email.com with some text after it

Example mbox-anon.txt after run

Here's one address: %%61286%%
Another address: %%51955%%
And the first address again: %%61286%% with some text after it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM