[英]How to remove special character from a csv file in python?
Hi I am trying to remove special character from csv file but not getting the satisfied result.您好我正在尝试从 csv 文件中删除特殊字符,但没有得到满意的结果。 Could you please help me how to do this?
你能帮我怎么做吗?
example:例子:
ÃœþÑÂúòð
Óþрþô áðýúт-ßõтõрñурó
These king of special characters I am getting.我得到的这些特殊字符之王。
I am saving the file using below python code-我正在使用下面的 python 代码保存文件-
df = pd.read_csv(r"D:\Users\SPate233\Documents\cleanData-JnJv2.csv", low_memory=False)
df.to_csv(r"D:\Users\SPate233\Documents\cleanData-JnJv2_new.csv", encoding='utf-8-sig', index=False)
I am not sure but you can try the Code
Snippet given below:-我不确定,但您可以尝试下面给出的
Code
片段:-
Basically, I have DataFrame
from your Data
.基本上,我从您的
Data
中获得了DataFrame
。 So, for Uploading CSV
with Special Characters .因此,对于上传带有特殊字符的
CSV
。 It is important to specify the encoding
type.指定
encoding
类型很重要。 So, I have used the ISO-8859-1
type of encoding
technique.因此,我使用了
ISO-8859-1
类型的encoding
技术。 Because ISO-8859-1
is a family of single-byte encoding schemes used to represent alphabets that can be represented within the range of 127 to 255.因为
ISO-8859-1
是一系列单字节编码方案,用于表示可以在 127 到 255 范围内表示的字母表。
To Learn more about
ISO-8859-1
Click here要了解有关
ISO-8859-1
更多信息,请单击此处
# Import all the important Libraries
import pandas as pd
# Read 'Data'
df = pd.read_csv('temp_data.csv', encoding = "ISO-8859-1")
# Print a few records of data with special characters
df
# Output of Above Cell:-
Data
0 ÃœþÑÂúòð
1 Óþрþô áðýúт-ßõтõрñурó
After reading DataFrame
.看完
DataFrame
。 we can move towards, the process of removal of Special Character .我们可以前进,去除特殊字符的过程。
code
for the same was stated below:-相同的
code
如下所述:-
# Removal of Special Characters
df['Data'] = df['Data'].map(str).apply(lambda x: x.encode('utf-8').decode('ascii', 'ignore'))
# Print Cleaned data
df
# Output of Above Cell:-
Data
0
1 -
As you can see we have removed all Special Characters .如您所见,我们已删除所有特殊字符。 So, we can store this
Result
to CSV
:-因此,我们可以将此
Result
存储到CSV
:-
# Store clean data into 'CSV' Format
df.to_csv(r'cleaned_temp_data.csv', encoding = 'utf-8-sig', index = False)
Hope this, Solution helps you.希望这个,解决方案可以帮助你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.