從 Python 字符串中刪除 u202a

Question

我正在嘗試在 Python 中打開一個文件，但出現錯誤，並且在字符串的開頭我得到一個/u202a字符...有誰知道如何刪除它？

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

OSError: [Errno 22] Invalid argument: '\‪H:\\7 - Script\\teste.csv'

Answer 1

當您最初創建 .py 文件時，您的文本編輯器引入了一個非打印字符。

考慮這一行：

carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

讓我們仔細選擇字符串，包括引號，並將其復制粘貼到交互式 Python 會話中：

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "‪H:\\7 - Script\\teste.csv"
'\u202aH:\\7 - Script\\teste.csv'
>>>

如您所見，在H之前有一個代碼點為 U-202A 的字符。

正如其他人指出的那樣，代碼點 U-202A 處的字符是LEFT-TO-RIGHT EMBEDDING 。 回到我們的 Python 會話：

>>> s = "‪H:\\7 - Script\\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>>

這進一步確認您的字符串中的第一個字符不是H ，而是非打印的LEFT-TO-RIGHT EMBEDDING字符。

我不知道你用什么文本編輯器來創建你的程序。 即使我知道，我也可能不是那個編輯器的專家。 無論如何，您使用的某些文本編輯器插入了您不知道的 U+202A。

一種解決方案是使用不會插入該字符和/或會突出顯示非打印字符的文本編輯器。 例如，在vim ，該行如下所示：

carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)

使用這樣的編輯器，只需刪除"和H之間的字符。

carregar_uml("H:\\7 - Script\\teste.csv", variaveis)

盡管此行在視覺上與您的原始行相同，但我已刪除了有問題的字符。 使用此行將避免您報告的OSError 。

Answer 2

問題是文件的目錄路徑沒有正確讀取。 使用原始字符串將其作為參數傳遞，它應該可以工作。

carregar_uml(r'H:\7 - Script\teste.csv', variaveis)

Answer 3

您可以使用此示例代碼從文件路徑中刪除 u202a

st="‪‪F:\\somepath\\filename.xlsx"    
data = pd.read_excel(st)

如果我嘗試這樣做，它會給我一個 OSError 和詳細信息

Traceback (most recent call last):
  File "F:\CodeRepo\PythonWorkSpace\demo\removepartofstring.py", line 14, in <module>
    data = pd.read_excel(st)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: '\u202aF:\\somepath\\filename.xlsx'

但如果我這樣做

    st="‪‪F:\\somepath\\filename.xlsx" 
    data = pd.read_excel(st.strip("‪u202a")) #replace your string here

它為我工作

Answer 4

嘗試帶（），

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

carregar_uml = carregar_uml.strip("\u202a")

Answer 5

或者你可以切出那個字符

file_path = r"‪C:\Test3\Accessing_mdb.txt"
file_path = file_path[1:]
with open(file_path, 'a') as f_obj:
f_obj.write('some words')

Answer 6

寫硬盤驅動器名稱時使用小寫字母！ 不是大字！

ex) H: -> 錯誤 ex) h: -> 不是錯誤

Answer 7

我嘗試了上述所有解決方案。 問題是當我們從左邊復制路徑或任何字符串寫入時，會添加額外的字符。 它不會顯示在我們的 IDE 中。 這個額外添加的字符表示從右到左標記(RLM) https://en.wikipedia.org/wiki/Right-to-left_mark ，即您在從右到左復制時選擇了文本。

檢查鏈接到我的答案的圖像。 我也確實嘗試從左到右復制，然后沒有添加這個額外的字符。 因此，要么手動輸入您的路徑，要么將其從左到右復制以避免此類問題。

Answer 8

以下是刪除“\‪”和“\‬”字符的簡單函數。

您可以將要刪除的任何字符添加到列表中。

def cleanup(inp):
    new_char = ""
    for char in inp:
        if char not in ["\u202a", "\u202c"]:
            new_char += char
    return new_char

example = '\u202a7551\u202c'
print(cleanup(example)) # prints 7551

從 Python 字符串中刪除 u202a

問題描述

8 個解決方案

解決方案1
8 2018-03-14 15:00:47

解決方案2
1 已采納 2018-03-14 00:45:48

解決方案3
1 2019-06-05 10:20:23

解決方案4
0 2019-09-27 09:29:00

解決方案5
0 2019-11-16 19:08:06

解決方案6
0 2019-12-29 06:19:15

解決方案7
0 2021-07-28 23:02:22

解決方案8
0 2022-01-11 17:02:12

從 Python 字符串中刪除 u202a

問題描述

8 個解決方案

解決方案1 8 2018-03-14 15:00:47

解決方案2 1 已采納 2018-03-14 00:45:48

解決方案3 1 2019-06-05 10:20:23

解決方案4 0 2019-09-27 09:29:00

解決方案5 0 2019-11-16 19:08:06

解決方案6 0 2019-12-29 06:19:15

解決方案7 0 2021-07-28 23:02:22

解決方案8 0 2022-01-11 17:02:12

解決方案1
8 2018-03-14 15:00:47

解決方案2
1 已采納 2018-03-14 00:45:48

解決方案3
1 2019-06-05 10:20:23

解決方案4
0 2019-09-27 09:29:00

解決方案5
0 2019-11-16 19:08:06

解決方案6
0 2019-12-29 06:19:15

解決方案7
0 2021-07-28 23:02:22

解決方案8
0 2022-01-11 17:02:12