帶有HTML特殊字符的熊貓read_csv（）

Question

我正在清理用逗號分隔的Python / Pandas中的CSV文件。

有些單元格具有& 作為文本的一部分。 當我運行read_csv（）時，它會將分號視為當前單元格的末尾，並使行的其余部分偏移。

我試過encoding='utf8'和其他各種選項...

編輯**我的代碼：

file = pd.read_csv('my-data-1.csv', encoding = 'utf8',index_col=False, low_memory=False)

file.drop(file.tail(1).index,inplace=True) #removing copyright line at the end


file_drop_dupes = file.drop_duplicates(['Project Id']) #drop the duplicates based on column Project Id

#drop all columns except these few
keep_col = ['Project Id','Project Name', 'Type']
new_file = file_drop_dupes[keep_col]
#write the result to a new csv file
new_file.to_csv('all-good-1.csv', index=False)

HTML字段的示例：

Service Maintenance &amp; Supply

Answer 1

在python 3.4及更高版本中，它是一個簡單的html.unescape() 。 在此之前，html.parser的HTMLParser.unescape() 。 看到這個答案。

Answer 2

如果您使用的是python 3+ html.unescape()是解決方案

帶有HTML特殊字符的熊貓read_csv（）

問題描述

2 個解決方案

解決方案1
0 2018-02-15 16:31:31

解決方案2
0 2018-02-15 16:34:03

帶有HTML特殊字符的熊貓read_csv（）

問題描述

2 個解決方案

解決方案1 0 2018-02-15 16:31:31

解決方案2 0 2018-02-15 16:34:03

解決方案1
0 2018-02-15 16:31:31

解決方案2
0 2018-02-15 16:34:03