从python pandas中的DataFrame中删除特定行

Question

I have a large .txt with data in bad formats. 我有一个大的.txt数据格式不正确。 I would like to remove some rows and convert rest of data to float numbers. 我想删除一些行并将其余数据转换为浮点数。 I would like to remove rows with 'X' or 'XX' , The rest I should convert to float, number like 4;00.1 should be converted to 4.001 The file looks like this sample: 我想删除'X'或'XX' ，其余的我应该转换为float，数字像4;00.1应该转换为4.001该文件看起来像这样的样本：

0,1,10/09/2012,3:01,4;09.1,5,6,7,8,9,10,11
1,-0.581586,11/09/2012,-1:93,0;20.3,739705,,0.892921,5,,6,7
2,XX,10/09/2012,3:04,4;76.0,0.183095,-0.057214,-0.504856,NaN,0.183095,12
3,-0.256051,10/09/2012,9:65,1;54.9,483293,0.504967,0.074442,-1.716287,7,0.504967,0.504967
4,-0.728092,11/09/2012,0:78,1;53.4,232247,4.556,0.328062,1.382914,NaN,4.556,4
5,4,11/09/2012,NaN,NaN,6.0008,NaN,NaN,NaN,6.000800,6.000000,6.000800
6,X,11/09/2012,X,X,5,X,8,2,1,17.000000,33.000000
7,,11/09/2012,,,,,,6.000000,5.000000,2.000000,2.000000
8,4,11/09/2012,7:98,3;04.5,5,6,3,7.000000,3.000000,3.000000,2
9,6,11/09/2012,2:21,4;67.2,5,2,2,7,3,8.000000,4.000000

I read it to DataFrame and choose rows 我把它读到DataFrame并选择行

from pandas import *
from csv import *
fileName = '~/data.txt'
colName = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
df = DataFrame(read_csv(fileName, names=colName))
print df[df['b'].isin(['X','XX',None,'NaN'])].to_string()

An output from last last line gives me only: 最后一行的输出只给我：

>>> print df[df['b'].isin(['X','XX',None,'NaN'])].to_string()
    b           c     d       e         f          g         h   i         j   k   l
a                                                                                   
2  XX  10/09/2012  3:04  4;76.0  0.183095  -0.057214 -0.504856 NaN  0.183095  12 NaN
6   X  11/09/2012     X       X  5.000000          X  8.000000   2  1.000000  17  33

Does not pick up row 7, and I would like to go through all df not only one column (original file is very large). 不接受第7行，我想通过所有df而不仅仅是一列（原始文件非常大）。

At the moment for conversion I use as below, but need remove unwanted rows first to apply it to all df. 在转换时我使用如下，但需要首先删除不需要的行以将其应用于所有df。

convert1 = lambda x : x.replace('.', '')
convert2 = lambda x : float(x.replace(';', '.'))
newNumber = convert2(convert1(df['e'][0]))

After choosing rows I would like to remove them from df, I try df.pop() but it works only for column not for rows. 选择行后我想从df中删除它们，我尝试df.pop()但它只适用于不适用于行的列。 I try to name rows but don't luck. 我试着命名行但不运气。 In this particular .txt I should finish with a new df from rows [0,3,8,9] with column 'c' as a date format, 'd' as a time format and the rest as the float. 在这个特定的.txt中，我应该用行[0,3,8,9]中的新df结束，列'c'作为日期格式，'d'作为时间格式，其余作为浮点数。 I try to figure it out for quite a while now, but do not know where to move, is it possible in pandas (probably should be) or do I need to change to ndarray or anything else? 我试着弄清楚它已经有一段时间了，但不知道在哪里移动，是否有可能在熊猫（可能应该）或者我是否需要更改为ndarray或其他任何东西？ Thanks for your advise 感谢您的意见

Answer 1

The problem with your original filter is it checks for 'NaN' rather than numpy.nan , which is what empty strings are parsed as by default. 原始过滤器的问题是它检查'NaN'而不是numpy.nan ，这是默认情况下解析的空字符串。 If you want to filter all the columns so you only get rows where no element is 'X' or 'XX', do something like this: 如果要过滤所有列，以便只获得没有元素为“X”或“XX”的行，请执行以下操作：

In [45]: names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

In [46]: df = pd.read_csv(StringIO(data), header=None, names=names)

In [47]: mask = df.applymap(lambda x: x in ['X', 'XX', None, np.nan])

In [48]: df[-mask.any(axis=1)]
Out[48]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 9
Data columns:
a    5  non-null values
b    5  non-null values
c    5  non-null values
d    5  non-null values
e    5  non-null values
f    5  non-null values
g    5  non-null values
h    5  non-null values
i    5  non-null values
j    4  non-null values
k    5  non-null values
l    5  non-null values
dtypes: float64(6), int64(1), object(5)

从python pandas中的DataFrame中删除特定行

问题描述

1 个解决方案

解决方案1
6 已采纳 2012-09-23 01:33:09

从python pandas中的DataFrame中删除特定行

问题描述

1 个解决方案

解决方案1 6 已采纳 2012-09-23 01:33:09

解决方案1
6 已采纳 2012-09-23 01:33:09