从python中的.txt文件中提取电子邮件地址

Question

I would like to parse out e-mail addresses from several text files in Python. 我想从Python中的几个文本文件中解析出电子邮件地址。 In a first attempt, I tried to get the following element that includes an e-mail address from a list of strings ( '2To whom correspondence should be addressed. E-mail: joachim+pnas@uci.edu.\\n' ). 在第一次尝试中，我尝试从字符串列表中获取包含电子邮件地址的以下元素（ '2To whom correspondence should be addressed. E-mail: joachim+pnas@uci.edu.\\n' ）。

When I try to find the list element that includes the e-mail address via i.find("@") == 0 it does not give me the content[i] . 当我尝试通过i.find("@") == 0查找包含电子邮件地址的列表元素时，它没有给我content[i] 。 Am I misunderstanding the .find() function? 我误解了.find()函数吗？ Is there a better way to do this? 有一个更好的方法吗？

from os import listdir

TextFileList = []
PathInput = "C:/Users/p282705/Desktop/PythonProjects/ExtractingEmailList/text/"

# Count the number of different files you have!
for filename in listdir(PathInput):
    if filename.endswith(".txt"):  # In case you accidentally put other files in directory
        TextFileList.append(filename)

for i in TextFileList:
    file = open(PathInput + i, 'r')
    content = file.readlines()
    file.close()

for i in content:
    if i.find("@") == 0:
        print(i)

Answer 1

The standard way of checking whether a string contains a character, in Python, is using the in operator . 在Python中，检查字符串是否包含字符的标准方法是使用in运算符。 In your case, that would be: 您的情况是：

for i in content:
    if "@" in i:
        print(i)

The find method, as you where using, returns the position where the @ character is located , starting at 0, as described in the Python official documentation . 如您所使用的那样， find方法返回@字符所在的位置 ，从0开始，如Python官方文档中所述。

For instance, in the string abc@google.com , it will return 3. In case the character is not located, it will return -1. 例如，在字符串abc@google.com ，它将返回3。如果未找到字符，则它将返回-1。 The equivalent code would be: 等效代码为：

for i in content:
    if i.find("@") != -1:
        print(i)

However, this is considered unpythonic and the in operator usage is preferred. 但是，这被认为是非Python的，并且in运算符的用法是首选。

Answer 2

'Find' function in python returns the index number of that character in a string. python中的“查找”功能返回字符串中该字符的索引号。 Maybe you can try this? 也许您可以尝试一下？

list = i.split(' ') # To split the string in words
for x in list:    # search each word in list for @ character
    if x.find("@") != -1:
        print(x)

Answer 3

Find returns the index if you find the substring you are searching for. 如果找到要搜索的子字符串，Find返回索引。 This isn't correct for what you are trying to do. 这与您要执行的操作不正确。

You would be better using a Regular Expression or RE to search for an occurence of @. 您最好使用正则表达式或RE搜索@的出现。 In your case, you may come into as situation where there are more than one email address per line (Again I don't know your input data so I can't take a guess) 在您的情况下，您可能会遇到这样的情况：每行有一个以上的电子邮件地址（同样，我不知道您的输入数据，所以我无法猜测）

Something along these lines would benefit you: 这些方针将使您受益：

import re
for i in content:
    findEmail = re.search(r'[\w\.-]+@[\w\.-]+', i)
    if findEmail:
     print(findEmail.group(0))

You would need to adjust this for valid email addresses... I'm not entirely sure if you can have symbols like +... 您需要针对有效的电子邮件地址进行调整...我不确定是否可以使用+等符号。

从python中的.txt文件中提取电子邮件地址

问题描述

3 个解决方案

解决方案1
4 2018-01-08 16:13:11

解决方案2
0 2018-01-08 16:13:12

解决方案3
0 2018-01-08 16:23:38

从python中的.txt文件中提取电子邮件地址

问题描述

3 个解决方案

解决方案1 4 2018-01-08 16:13:11

解决方案2 0 2018-01-08 16:13:12

解决方案3 0 2018-01-08 16:23:38

解决方案1
4 2018-01-08 16:13:11

解决方案2
0 2018-01-08 16:13:12

解决方案3
0 2018-01-08 16:23:38