Code:
import socket, feedparser
feed = feedparser.parse("http://pwnmyi.com/feed")
latest = feed.entries[0]
art_name = latest.title
network = 'irc.rizon.net'
port = 6667
irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
irc.connect((network, port))
print irc.recv(4096)
irc.send('NICK PwnBot\r\n')
irc.send('USER PwnBot PwnBot PwnBot :PwnBot by Fike\r\n')
irc.send('JOIN #pwnmyi\r\n')
while True:
data = irc.recv(4096)
if data.find('PING') != -1:
irc.send('PONG ' + data.split() [1] + '\r\n')
if data.find( '!latest' ) != -1:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
It connects etc., but then when I do !latest in the channel, it just quits with this:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 55: ordinal not in range(128)
Could you please help me debug this code? It used to work for me before.
the IRC protocol does not define a particular character set encoding used for messages, rather it's an 8bit protocol, which has certain octets used for control characters. (See rfc1459 section 2.2 .
Apparently the popular mIRC client will decode utf8 sequences if it recognizes them as such, and this makes pretty decent sense for irc's use since ascii codepoints are encoded with the same bytes as the ascii characters, and non-ascii codepoints are all encoded as values > 127.
In python, that's spelled unicode.encode(encoding='utf8')
like so:
>>> u'\u0ca0_\u0ca0'.encode('utf8')
'\xe0\xb2\xa0_\xe0\xb2\xa0'
Personally I'd recommend converting all strings to 'utf-8', you can encode/decode unicode strings using this:
def decode(bytes):
try:
text = bytes.decode('utf-8')
except UnicodeDecodeError:
try:
text = bytes.decode('iso-8859-1')
except UnicodeDecodeError:
text = bytes.decode('cp1252')
return text
def encode(bytes):
try:
text = bytes.encode('utf-8')
except UnicodeEncodeError:
try:
text = bytes.encode('iso-8859-1')
except UnicodeEncodeError:
text = bytes.encode('cp1252')
return text
This is an excellent website that explains Python's Unicode: http://farmdev.com/talks/unicode
The best 3 tips from it are:
You'll have to encode the string you post to the IRC server. Also, depending on what feedparser returns, you might want to decode it from a specific encoding.
Encoding depends on what does the feed contain.
latest.title
has non-ASCII characters in it.
You must either remove them, escape them or translate them.
The cheap and easy way out is to use repr()
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + repr(art_name) + '\r\n')
Or better
irc.send('PRIVMSG #pwnmyi :Latest Article: {0!r}\r\n'.format( art_name ) )
In the long run, you need to address non-ASCII characters in your input.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.