简体   繁体   English

从 .text 文件中提取电子邮件的 Python 脚本

[英]Python script to extract emails from .text file

I am currently attempting to run a script that extracts all the emails from a .txt file.我目前正在尝试运行从 .txt 文件中提取所有电子邮件的脚本。 When running the script, I get an invalid syntax error.运行脚本时,我收到无效的语法错误。 Perhaps someone can help...也许有人可以帮助...

import re
in_file = open("C:\\Users\\Testing1_Emails.txt","rt")


for line in in_file:
    if re.match(r'[\w\.-]+@[\w\.-]+')
        print line

you have to write:你必须写:

if re.match(r'[\w\.-]+@[\w\.-]+',  line):

(add 'line' and ':') (添加“行”和“:”)

The issue lies here:问题出在这里:

for line in in_file:
    if re.match(r'[\w\.-]+@[\w\.-]+')
        print line

In the if re.match(r'[\\w\\.-]+@[\\w\\.-]+') statement, you don't end with :if re.match(r'[\\w\\.-]+@[\\w\\.-]+')语句中,您不会以:

match method requires 2 arguments. match 方法需要 2 个参数。

see at : https://docs.python.org/2/library/re.html#re.match见: https : //docs.python.org/2/library/re.html#re.match

re.match(pattern, string, flags=0)重新匹配(模式,字符串,标志= 0)

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.如果字符串开头的零个或多个字符与正则表达式模式匹配,则返回相应的 MatchObject 实例。 Return None if the string does not match the pattern;如果字符串与模式不匹配,则返回 None; note that this is different from a zero-length match.请注意,这与零长度匹配不同。

Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.请注意,即使在 MULTILINE 模式下, re.match() 也只会匹配字符串的开头,而不是每行的开头。

If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).如果您想在字符串中的任何位置找到匹配项,请改用 search()(另请参阅 search() 与 match())。

most mail IDs allow alphabets, numbers, dot(.), underscores(_) and all of them contain "@" for sure.大多数邮件 ID 允许使用字母、数字、点(.)、下划线(_),并且所有这些都肯定包含“@”。 we can use this information to write a pattern using regex.我们可以使用此信息使用正则表达式编写模式。

import re
pat = re.compile(r'[a-zA-Z0-9\._]+@[a-zA-Z\.]') # regex pattern

[az]+ will match any lower case alphabet, any number of occurence [az]+ 将匹配任何小写字母,任意数量的出现
[0-9]+ will match any digit, any number of occurence [0-9]+ 将匹配任何数字,任何出现次数
[.] will match '.' [.] 将匹配 '.'

Further, if you want to check that your pattern matches your search strings, check it out here.此外,如果您想检查您的模式是否与您的搜索字符串匹配,请在此处查看。 https://regexr.com/ https://regexr.com/

example:--例子: -

f = open("my_file.txt", "w")
f.write('walkup@cs.washington.edu, geb@cs.pitt.edu, walkup@cs.washington.edu \n')
mails = re.findall(r"[a-z]+@[a-z\.]+", f.read())
print(list(set(mails)))

out: ['walkup@cs.washington.edu', 'geb@cs.pitt.edu', 'walkup@cs.washington.edu']出站:['walkup@cs.washington.edu'、'geb@cs.pitt.edu'、'walkup@cs.washington.edu']

note: re.findall() applies re.pattern() internally over the specified pattern.注意:re.findall() 在内部对指定的模式应用 re.pattern()。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM