简体   繁体   English

如何从 python 中的 csv 文件中删除特殊字符?

[英]How to remove special character from a csv file in python?

Hi I am trying to remove special character from csv file but not getting the satisfied result.您好我正在尝试从 csv 文件中删除特殊字符,但没有得到满意的结果。 Could you please help me how to do this?你能帮我怎么做吗?

example:例子:

ÃœþÑÂúòð
Óþрþô áðýúт-ßõтõрñурó

These king of special characters I am getting.我得到的这些特殊字符之王。

I am saving the file using below python code-我正在使用下面的 python 代码保存文件-

df = pd.read_csv(r"D:\Users\SPate233\Documents\cleanData-JnJv2.csv", low_memory=False)
df.to_csv(r"D:\Users\SPate233\Documents\cleanData-JnJv2_new.csv", encoding='utf-8-sig', index=False)

I am not sure but you can try the Code Snippet given below:-我不确定,但您可以尝试下面给出的Code片段:-

Basically, I have DataFrame from your Data .基本上,我从您的Data中获得了DataFrame So, for Uploading CSV with Special Characters .因此,对于上传带有特殊字符CSV It is important to specify the encoding type.指定encoding类型很重要。 So, I have used the ISO-8859-1 type of encoding technique.因此,我使用了ISO-8859-1类型的encoding技术。 Because ISO-8859-1 is a family of single-byte encoding schemes used to represent alphabets that can be represented within the range of 127 to 255.因为ISO-8859-1是一系列单字节编码方案,用于表示可以在 127 到 255 范围内表示的字母表。

To Learn more about ISO-8859-1 Click here要了解有关ISO-8859-1更多信息,请单击此处

# Import all the important Libraries
import pandas as pd

# Read 'Data'
df = pd.read_csv('temp_data.csv', encoding = "ISO-8859-1")

# Print a few records of data with special characters
df
# Output of Above Cell:-
    Data
0   ÃœþÑÂúòð
1   Óþрþô áðýúт-ßõтõрñурó

After reading DataFrame .看完DataFrame we can move towards, the process of removal of Special Character .我们可以前进,去除特殊字符的过程。 code for the same was stated below:-相同的code如下所述:-

# Removal of Special Characters
df['Data'] = df['Data'].map(str).apply(lambda x: x.encode('utf-8').decode('ascii', 'ignore'))

# Print Cleaned data
df
# Output of Above Cell:-
    Data
0   
1   -

As you can see we have removed all Special Characters .如您所见,我们已删除所有特殊字符 So, we can store this Result to CSV :-因此,我们可以将此Result存储到CSV :-

# Store clean data into 'CSV' Format
df.to_csv(r'cleaned_temp_data.csv', encoding = 'utf-8-sig', index = False)

Hope this, Solution helps you.希望这个,解决方案可以帮助你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM