[英]Python - how to delete all characters in all lines after some sign?
I want to delete all characters in all lines after the @
sign. 我想删除@
符号后面所有行中的所有字符。 I wrote some piece of code: 我写了一些代码:
#!/usr/bin/env python
import sys, re, urllib2
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()
html2 = html[0]
for x in html.rsplit('@'):
print x
But it only deletes @
sign and copies the rest of characters into next line. 但它只删除@
符号并将其余字符复制到下一行。 So how I can modify this code, to delete all characters in all lines after @
? 那么如何修改这段代码,删除@
之后所有行中的所有字符? Should I use a regex? 我应该使用正则表达式吗?
You are splitting too many times; 你分裂的次数太多了; use str.rpartition()
instead and just ignore the part after @
. 请改用str.rpartition()
然后忽略@
之后的部分。 Do this per line : 每行执行此操作:
for line in html.splitlines():
cleaned = line.rpartition('@')[0]
print cleaned
or, for older Python versions, limit str.rsplit()
to just 1 split, and again only take the first result: 或者,对于较旧的Python版本,将str.rsplit()
限制为仅1次拆分,并再次仅获取第一个结果:
for line in html.splitlines():
cleaned = line.rsplit('@', 1)[0]
print cleaned
I used str.splitlines()
to cleanly split a text regardless of newline style. 无论换行样式如何,我都使用str.splitlines()
来干净地分割文本。 You can also loop directly over the urllib2
response file object: 您还可以直接遍历urllib2
响应文件对象:
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
for line in document:
cleaned = line.rpartition('@')[0]
print cleaned
Demo: 演示:
>>> import urllib2
>>> url = 'http://varenhor.st/wp-content/uploads/emails.txt'
>>> document = urllib2.urlopen(url)
>>> for line in document:
... cleaned = line.rpartition('@')[0]
... print cleaned
...
ADAKorb...
AllisonSarahMoo...
Artemislinked...
BTBottg...
BennettLee...
Billa...
# etc.
You can use Python's slice notation: 您可以使用Python的切片表示法:
import re
import sys
import urllib2
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()
for line in html.splitlines():
at_index = line.index('@')
print line[:at_index]
Since strings are sequences, you can slice them. 由于字符串是序列,您可以对它们进行切片。 For instance, 例如,
hello_world = 'Hello World'
hello = hello_world[:5]
world = hello_world[6:]
Bear in mind, slicing returns a new sequence and doesn't modify the original sequence. 请记住,切片会返回一个新序列,而不会修改原始序列。
Since you already import
ed re
, you can use it: 由于您已import
ed re
,因此可以使用它:
document = urllib2.urlopen(url)
reg_ptn = re.compile(r'@.*')
for line in document:
print reg_ptn.sub('', line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.