Python - how to delete all characters in all lines after some sign?

Question

I want to delete all characters in all lines after the @ sign. I wrote some piece of code:

#!/usr/bin/env python
import sys, re, urllib2
url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

html2 = html[0]
for x in html.rsplit('@'):
    print x

But it only deletes @ sign and copies the rest of characters into next line. So how I can modify this code, to delete all characters in all lines after @ ? Should I use a regex?

Answer 1

You are splitting too many times; use str.rpartition() instead and just ignore the part after @ . Do this per line :

for line in html.splitlines():
    cleaned = line.rpartition('@')[0]
    print cleaned

or, for older Python versions, limit str.rsplit() to just 1 split, and again only take the first result:

for line in html.splitlines():
    cleaned = line.rsplit('@', 1)[0]
    print cleaned

I used str.splitlines() to cleanly split a text regardless of newline style. You can also loop directly over the urllib2 response file object:

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
for line in document:
    cleaned = line.rpartition('@')[0]
    print cleaned

Demo:

>>> import urllib2
>>> url = 'http://varenhor.st/wp-content/uploads/emails.txt'
>>> document = urllib2.urlopen(url)
>>> for line in document:
...     cleaned = line.rpartition('@')[0]
...     print cleaned
... 
ADAKorb...
AllisonSarahMoo...
Artemislinked...
BTBottg...
BennettLee...
Billa...
# etc.

Answer 2

You can use Python's slice notation:

import re
import sys
import urllib2

url = 'http://varenhor.st/wp-content/uploads/emails.txt'
document = urllib2.urlopen(url)
html = document.read()

for line in html.splitlines():
    at_index = line.index('@')
    print line[:at_index]

Since strings are sequences, you can slice them. For instance,

hello_world = 'Hello World'
hello = hello_world[:5]
world = hello_world[6:]

Bear in mind, slicing returns a new sequence and doesn't modify the original sequence.

Answer 3

Since you already import ed re , you can use it:

document = urllib2.urlopen(url)
reg_ptn = re.compile(r'@.*')
for line in document:
    print reg_ptn.sub('', line)

Python - how to delete all characters in all lines after some sign?

Question

3 answers

solution1
2 ACCPTED 2014-06-01 22:08:19

solution2
1 2014-06-01 22:07:39

solution3
0 2014-06-01 22:21:30

Python - how to delete all characters in all lines after some sign?

Question

3 answers

solution1 2 ACCPTED 2014-06-01 22:08:19

solution2 1 2014-06-01 22:07:39

solution3 0 2014-06-01 22:21:30

solution1
2 ACCPTED 2014-06-01 22:08:19

solution2
1 2014-06-01 22:07:39

solution3
0 2014-06-01 22:21:30