简体   繁体   English

Python - 如何删除某些符号后所有行中的所有字符?

[英]Python - how to delete all characters in all lines after some sign?

I want to delete all characters in all lines after the @ sign. 我想删除@符号后面所有行中的所有字符。 I wrote some piece of code: 我写了一些代码:

#!/usr/bin/env python
import sys, re, urllib2
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

html2 = html[0]
for x in html.rsplit('@'):
    print x

But it only deletes @ sign and copies the rest of characters into next line. 但它只删除@符号并将其余字符复制到下一行。 So how I can modify this code, to delete all characters in all lines after @ ? 那么如何修改这段代码,删除@之后所有行中的所有字符? Should I use a regex? 我应该使用正则表达式吗?

You are splitting too many times; 你分裂的次数太多了; use str.rpartition() instead and just ignore the part after @ . 请改用str.rpartition()然后忽略@之后的部分。 Do this per line : 每行执行此操作:

for line in html.splitlines():
    cleaned = line.rpartition('@')[0]
    print cleaned

or, for older Python versions, limit str.rsplit() to just 1 split, and again only take the first result: 或者,对于较旧的Python版本,将str.rsplit()限制为仅1次拆分,并再次仅获取第一个结果:

for line in html.splitlines():
    cleaned = line.rsplit('@', 1)[0]
    print cleaned

I used str.splitlines() to cleanly split a text regardless of newline style. 无论换行样式如何,我都使用str.splitlines()来干净地分割文本。 You can also loop directly over the urllib2 response file object: 您还可以直接遍历urllib2响应文件对象:

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
for line in document:
    cleaned = line.rpartition('@')[0]
    print cleaned

Demo: 演示:

>>> import urllib2
>>> url = 'http://varenhor.st/wp-content/uploads/emails.txt'
>>> document = urllib2.urlopen(url)
>>> for line in document:
...     cleaned = line.rpartition('@')[0]
...     print cleaned
... 
ADAKorb...
AllisonSarahMoo...
Artemislinked...
BTBottg...
BennettLee...
Billa...
# etc.

You can use Python's slice notation: 您可以使用Python的切片表示法:

import re
import sys
import urllib2

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

for line in html.splitlines():
    at_index = line.index('@')
    print line[:at_index]

Since strings are sequences, you can slice them. 由于字符串是序列,您可以对它们进行切片。 For instance, 例如,

hello_world = 'Hello World'
hello = hello_world[:5]
world = hello_world[6:]

Bear in mind, slicing returns a new sequence and doesn't modify the original sequence. 请记住,切片会返回一个新序列,而不会修改原始序列。

Since you already import ed re , you can use it: 由于您已import ed re ,因此可以使用它:

document = urllib2.urlopen(url)
reg_ptn = re.compile(r'@.*')
for line in document:
    print reg_ptn.sub('', line)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中反斜杠后删除所有字符? - Delete all characters after a backslash in python? 如何通过python删除文本文件中的所有字符 - How to delete all characters in text file by python "如何删除包含文本文件的字母和字符的所有行" - how to delete all lines containing Letters and characters for a textfile 如何在 Python 中匹配后抓取所有行 - How to grab all lines after a match in Python 如何使用 python 脚本删除文件每行中“==”后的所有字符并更新文件? - How to delete all characters after a “==” in each line of a file and update the file, using python script? 如何借助python删除文件中的所有空白行? - How to delete all blank lines in the file with the help of python? Python - 如何删除子字符串中直到并包括关键字的所有字符 - Python - How to delete all characters in a sub string up to and including a keyword 如何使用 python 中的正则表达式删除除某些特殊字符外的所有特殊字符 - How to remove all special characters except for some, using regex in python 如何将字符与Python中某个字符串中的所有字符进行比较? - How do I compare a character to all the characters in some string in Python? 如何从python中的文本文件中删除所有带有大写字母和数字和特殊字符的行以及所有长度超过10个字符的行 - How to remove all lines with caps AND digits AND special characters AND all the lines longer than 10 characters from a text file in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM