从熊猫数据框中删除特定字符

Question

I have a csv file which seems to have several values which have junk data that look like: ÂÂ‡_Â¤Ã‹Ã§Ã©Ã¨_Â… 我有一个csv文件，该文件似乎包含具有垃圾数据的多个值，如下所示：Â‡_Â¤Ã‹Ã§Ã§Ã©Ã__…

I have imported the file into a pandas dataframe. 我已将文件导入到pandas数据框。 How can I get rid of these characters? 我如何摆脱这些角色？ I would like to delete the contents of the cell which have such characters and put in a flag value instead (something like -99999). 我想删除具有此类字符的单元格的内容，并改用标志值（例如-99999）。 The table has mixed data types. 该表具有混合数据类型。

import pandas as pd
import codecs
import unicodedata
import csv
import StringIO

testData = pd.read_csv('Data.csv', encoding="iso-8859-1", engine='python')

/ Using encoding utf-8 gives me an error about invalid start byte, using default engine doesn't work either. / 使用utf-8编码会给我一个有关无效起始字节的错误，使用默认引擎也不起作用。 / /

Any suggestions? 有什么建议么？

Answer 1

IF you know what characters you are willing to accept, you could use a regex to filter your values, something like: 如果您知道愿意接受哪些字符，则可以使用正则表达式来过滤值，例如：

testData['stringcol'].where(testData['stringcol'].str.contains('[^A-Za-z0-9\s]'), 
-999999)

从熊猫数据框中删除特定字符

问题描述

1 个解决方案

解决方案1
1 2015-10-13 04:13:42

从熊猫数据框中删除特定字符

问题描述

1 个解决方案

解决方案1 1 2015-10-13 04:13:42

解决方案1
1 2015-10-13 04:13:42