python pandas to drop lines and substitute values in specific columns of a csv file

Question

Can I use pandas python module to do this:

automatically drop lines that do not have values at specific columns, for example columns 1 and 2
substitute the remaining missing values with a predefined value

I searched online and could not find a way to achieve both conditions.

Example:
This input (where NA is either a specific character or whitespace, and X is another character, known a priori)

NA, 1, 2, X, 5, 6
5, 6, 7, 8, 9, 10
NA, 3, 4, 5, 6, 7
9, 8, 7, 6, 5, X

should become

5, 6, 7, 8, 9, 10
9, 8, 7, 6, 5, 0

Answer 1

To drop the rows with NA, you can do:

df.dropna()

To specify the columns where is checked for NaNs, you can provide the subset keyword argument, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
To replace a certain value, you can do:

df.replace('X', 0)

Full example:

In [14]: df
Out[14]: 
    0  1  2  3  4   5
0 NaN  1  2  X  5   6
1   5  6  7  8  9  10
2 NaN  3  4  5  6   7
3   9  8  7  6  5   X

In [15]: df.dropna(subset=[0,1])
Out[15]: 
   0  1  2  3  4   5
1  5  6  7  8  9  10
3  9  8  7  6  5   X

In [16]: df.dropna(subset=[0,1]).replace('X', 0)
Out[16]: 
   0  1  2  3  4   5
1  5  6  7  8  9  10
3  9  8  7  6  5   0

Aside, it is not very efficient to have strings like 'X' in numeric columns (this will make it of object type instead of int or float )

python pandas to drop lines and substitute values in specific columns of a csv file

Question

1 answers

solution1
0 ACCPTED 2014-08-15 14:48:32

python pandas to drop lines and substitute values in specific columns of a csv file

Question

1 answers

solution1 0 ACCPTED 2014-08-15 14:48:32

solution1
0 ACCPTED 2014-08-15 14:48:32