从 .text 文件中提取电子邮件的 Python 脚本

Question

I am currently attempting to run a script that extracts all the emails from a .txt file.我目前正在尝试运行从 .txt 文件中提取所有电子邮件的脚本。 When running the script, I get an invalid syntax error.运行脚本时，我收到无效的语法错误。 Perhaps someone can help...也许有人可以帮助...

import re
in_file = open("C:\\Users\\Testing1_Emails.txt","rt")


for line in in_file:
    if re.match(r'[\w\.-]+@[\w\.-]+')
        print line

Answer 1

you have to write:你必须写：

if re.match(r'[\w\.-]+@[\w\.-]+',  line):

(add 'line' and ':') （添加“行”和“：”）

Answer 2

The issue lies here:问题出在这里：

for line in in_file:
    if re.match(r'[\w\.-]+@[\w\.-]+')
        print line

In the if re.match(r'[\\w\\.-]+@[\\w\\.-]+') statement, you don't end with :在if re.match(r'[\\w\\.-]+@[\\w\\.-]+')语句中，您不会以:

Answer 3

match method requires 2 arguments. match 方法需要 2 个参数。

see at : https://docs.python.org/2/library/re.html#re.match见： https : //docs.python.org/2/library/re.html#re.match

re.match(pattern, string, flags=0)重新匹配（模式，字符串，标志= 0）

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.如果字符串开头的零个或多个字符与正则表达式模式匹配，则返回相应的 MatchObject 实例。 Return None if the string does not match the pattern;如果字符串与模式不匹配，则返回 None； note that this is different from a zero-length match.请注意，这与零长度匹配不同。

Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.请注意，即使在 MULTILINE 模式下， re.match() 也只会匹配字符串的开头，而不是每行的开头。

If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).如果您想在字符串中的任何位置找到匹配项，请改用 search()（另请参阅 search() 与 match()）。

Answer 4

most mail IDs allow alphabets, numbers, dot(.), underscores(_) and all of them contain "@" for sure.大多数邮件 ID 允许使用字母、数字、点（.）、下划线（_），并且所有这些都肯定包含“@”。 we can use this information to write a pattern using regex.我们可以使用此信息使用正则表达式编写模式。

import re
pat = re.compile(r'[a-zA-Z0-9\._]+@[a-zA-Z\.]') # regex pattern

[az]+ will match any lower case alphabet, any number of occurence [az]+ 将匹配任何小写字母，任意数量的出现
[0-9]+ will match any digit, any number of occurence [0-9]+ 将匹配任何数字，任何出现次数
[.] will match '.' [.] 将匹配 '.'

Further, if you want to check that your pattern matches your search strings, check it out here.此外，如果您想检查您的模式是否与您的搜索字符串匹配，请在此处查看。 https://regexr.com/ https://regexr.com/

example:--例子： -

f = open("my_file.txt", "w")
f.write('walkup@cs.washington.edu, geb@cs.pitt.edu, walkup@cs.washington.edu \n')
mails = re.findall(r"[a-z]+@[a-z\.]+", f.read())
print(list(set(mails)))

out: ['walkup@cs.washington.edu', 'geb@cs.pitt.edu', 'walkup@cs.washington.edu']出站：['walkup@cs.washington.edu'、'geb@cs.pitt.edu'、'walkup@cs.washington.edu']

note: re.findall() applies re.pattern() internally over the specified pattern.注意：re.findall() 在内部对指定的模式应用 re.pattern()。

从 .text 文件中提取电子邮件的 Python 脚本

问题描述

4 个解决方案

解决方案1
2 已采纳 2017-02-28 15:00:38

解决方案2
1 2017-02-28 14:54:00

解决方案3
0 2017-02-28 15:08:06

解决方案4
0 2021-12-06 12:47:52

从 .text 文件中提取电子邮件的 Python 脚本

问题描述

4 个解决方案

解决方案1 2 已采纳 2017-02-28 15:00:38

解决方案2 1 2017-02-28 14:54:00

解决方案3 0 2017-02-28 15:08:06

解决方案4 0 2021-12-06 12:47:52

解决方案1
2 已采纳 2017-02-28 15:00:38

解决方案2
1 2017-02-28 14:54:00

解决方案3
0 2017-02-28 15:08:06

解决方案4
0 2021-12-06 12:47:52