Python - 如何刪除某些符號后所有行中的所有字符？

Question

我想刪除@符號后面所有行中的所有字符。 我寫了一些代碼：

#!/usr/bin/env python
import sys, re, urllib2
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

html2 = html[0]
for x in html.rsplit('@'):
    print x

但它只刪除@符號並將其余字符復制到下一行。 那么如何修改這段代碼，刪除@之后所有行中的所有字符？ 我應該使用正則表達式嗎？

Answer 1

你分裂的次數太多了; 請改用str.rpartition()然后忽略@之后的部分。 每行執行此操作：

for line in html.splitlines():
    cleaned = line.rpartition('@')[0]
    print cleaned

或者，對於較舊的Python版本，將str.rsplit()限制為僅1次拆分，並再次僅獲取第一個結果：

for line in html.splitlines():
    cleaned = line.rsplit('@', 1)[0]
    print cleaned

無論換行樣式如何，我都使用str.splitlines()來干凈地分割文本。 您還可以直接遍歷urllib2響應文件對象：

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
for line in document:
    cleaned = line.rpartition('@')[0]
    print cleaned

演示：

>>> import urllib2
>>> url = 'http://varenhor.st/wp-content/uploads/emails.txt'
>>> document = urllib2.urlopen(url)
>>> for line in document:
...     cleaned = line.rpartition('@')[0]
...     print cleaned
... 
ADAKorb...
AllisonSarahMoo...
Artemislinked...
BTBottg...
BennettLee...
Billa...
# etc.

Answer 2

您可以使用Python的切片表示法：

import re
import sys
import urllib2

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

for line in html.splitlines():
    at_index = line.index('@')
    print line[:at_index]

由於字符串是序列，您可以對它們進行切片。 例如，

hello_world = 'Hello World'
hello = hello_world[:5]
world = hello_world[6:]

請記住，切片會返回一個新序列，而不會修改原始序列。

Answer 3

由於您已import ed re ，因此可以使用它：

document = urllib2.urlopen(url)
reg_ptn = re.compile(r'@.*')
for line in document:
    print reg_ptn.sub('', line)

Python - 如何刪除某些符號后所有行中的所有字符？

問題描述

3 個解決方案

解決方案1
2 已采納 2014-06-01 22:08:19

解決方案2
1 2014-06-01 22:07:39

解決方案3
0 2014-06-01 22:21:30

Python - 如何刪除某些符號后所有行中的所有字符？

問題描述

3 個解決方案

解決方案1 2 已采納 2014-06-01 22:08:19

解決方案2 1 2014-06-01 22:07:39

解決方案3 0 2014-06-01 22:21:30

解決方案1
2 已采納 2014-06-01 22:08:19

解決方案2
1 2014-06-01 22:07:39

解決方案3
0 2014-06-01 22:21:30