简体   繁体   中英

Remove u202a from Python string

I'm trying to open a file in Python, but I got an error, and in the beginning of the string I got a /u202a character... Does anyone know how to remove it?

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

OSError: [Errno 22] Invalid argument: '\‪H:\\7 - Script\\teste.csv'

When you initially created your .py file, your text editor introduced a non-printing character.

Consider this line:

carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

Let's carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "‪H:\\7 - Script\\teste.csv"
'\u202aH:\\7 - Script\\teste.csv'
>>> 

As you can see, there is a character with codepoint U-202A immediately before the H .

As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING . Returning to our Python session:

>>> s = "‪H:\\7 - Script\\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>> 

This further confirms that the first character in your string is not H , but the non-printing LEFT-TO-RIGHT EMBEDDING character.

I don't know what text editor you used to create your program. Even if I knew, I'm probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

One solution is to use a text editor that won't insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)

Using such an editor, simply delete the character between " and H .

carregar_uml("H:\\7 - Script\\teste.csv", variaveis)

Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.

The problem is the directory path of the file is not read properly. Use raw strings to pass it as argument and it should work.

carregar_uml(r'H:\7 - Script\teste.csv', variaveis)

you can use this sample code to remove u202a from file path

st="‪‪F:\\somepath\\filename.xlsx"    
data = pd.read_excel(st)

if i try to do this it gives me a OSError and In detail

Traceback (most recent call last):
  File "F:\CodeRepo\PythonWorkSpace\demo\removepartofstring.py", line 14, in <module>
    data = pd.read_excel(st)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: '\u202aF:\\somepath\\filename.xlsx'

but if i do that like this

    st="‪‪F:\\somepath\\filename.xlsx" 
    data = pd.read_excel(st.strip("‪u202a")) #replace your string here

Its working for me

try strip(),

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

carregar_uml = carregar_uml.strip("\u202a")

Or you can slice out that character

file_path = r"‪C:\Test3\Accessing_mdb.txt"
file_path = file_path[1:]
with open(file_path, 'a') as f_obj:
f_obj.write('some words')

use small letter when you write your hard-disk-drive name! not big letter!

ex) H: -> error ex) h: -> not error

I tried all of the above solutions. Problem is when we copy path or any string from left to write, extra character is added . It does not show in our IDE. this extra added character denotes Right to Left mark (RLM) https://en.wikipedia.org/wiki/Right-to-left_mark , ie you selected the text at time of copying from Right to left.

check the image Linked to my answer. 在此处输入图片说明 I also did try copying left to right ,then this extra character is not added. So either type your path manually or copy it left to right to avoid this type of issue.

The following is a simple function to remove the "\‪"and "\‬" characters.

you can add any characters you want to be removed to the list.

def cleanup(inp):
    new_char = ""
    for char in inp:
        if char not in ["\u202a", "\u202c"]:
            new_char += char
    return new_char

example = '\u202a7551\u202c'
print(cleanup(example)) # prints 7551

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM