Python - Remove Square symbol from text string

Question

a='ÿþ"[]B[]a[]l[]a[]n[]c[]e'

NOTE: The open and close square brackets represent this square symbol. I cannot however copy and paste the square symbol into here to show you exactly what I'm looking at.

The characters in 'a' represent the beginning of a file I've downloaded. It is a csv file, unicode. How do I remove these unwanted characters? I would just like to recover the word 'balance' from a.

The code I've used to simply this example:

fi = open(path+fn, 'r')
data = fi.read()
fi.close()
print(data)

Where fn is a csv file.

Tried:

data=data.encode()
d=replace('\x00','')

which produced error:

TypeError: expected bytes, bytearray or buffer compatible object

Answer 1

You need to specify the right encoding when opening the file. Try

open(path+fn, 'r', encoding="utf-16")

(I'm guessing utf-16 because ASCII characters seem to be encoded in two bytes in the sample string)

Answer 2

If you don't want to mess with encoding, string.printable is a list of 'printable' chars which may be what you're looking for.

>>> from string import printable
>>> best_string_ever = filter(lambda x: x in printable, a)
>>> best_string_ever
'"Balance'

Answer 3

If you can show the character value, then you can use the strip(u'\\uxxx\u0026#39;) command

use the replace() method

newstring = textstring.replace(u'\uxxx', '')

In this case pass in the actual character encoding that you want.

Python - Remove Square symbol from text string

Question

3 answers

solution1
2 ACCPTED 2014-02-26 17:29:34

solution2
0 2014-02-26 17:29:54

solution3
0 2014-02-26 17:33:07

Python - Remove Square symbol from text string

Question

3 answers

solution1 2 ACCPTED 2014-02-26 17:29:34

solution2 0 2014-02-26 17:29:54

solution3 0 2014-02-26 17:33:07

solution1
2 ACCPTED 2014-02-26 17:29:34

solution2
0 2014-02-26 17:29:54

solution3
0 2014-02-26 17:33:07