Making a pandas dataFrame based on some column values of another dataFrame

Question

I have a pandas DataFrame df1 with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12

I would like to make a DataFrame that is based on df1 but that has any row containing an empty value removed. For example:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            15
   C              12            11
   C              12            13

I tried something like this

df1=df[~np.isnan(df["year"]) or ~np.isnan(df["current"])]

But I received the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What could be the problem?

Answer 1

Please try with bitwise operator | instead, like this:

df1=df[ (~np.isnan(df["year"])) | (~np.isnan(df["current"]))]

Using dropna() , as suggested by EdChum, is likely the cleanest and neatest solution here. You can read more about this or working with missing data generally here

Answer 2

You can just call dropna to achieve this:

df1 = df.dropna()

As to why what you tried failed or operator doesn't understand what it should do when comparing array like structures as it is ambiguous if 1 or more elements meet the boolean criteria, you should use the bitwise operators & , | and ~ for and , or and not repsectively. Additionally for multiple conditions you need to wrap the conditions in parentheses due to operator precedence.

In [4]:
df.dropna()

Out[4]:
  Serial N  year  current
0        B    10       14
1        B    10       16
2        B    11       10
4        B    11       15
5        C    12       11
7        C    12       13

Answer 3

if you really have empty cells instead of NaN's:

In [122]: df
Out[122]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
3        B  11.0
4        B  11.0    15.0
5        C  12.0    11.0
6        C           9.0
7        C  12.0    13.0
8        C  12.0

In [123]: a.replace('', np.nan).dropna()
Out[123]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
4        B  11.0    15.0
5        C  12.0    11.0
7        C  12.0    13.0

Making a pandas dataFrame based on some column values of another dataFrame

Question

3 answers

solution1
2 2016-04-28 10:27:45

solution2
2 ACCPTED 2016-04-28 10:29:13

solution3
2 2016-04-28 10:31:43

Making a pandas dataFrame based on some column values of another dataFrame

Question

3 answers

solution1 2 2016-04-28 10:27:45

solution2 2 ACCPTED 2016-04-28 10:29:13

solution3 2 2016-04-28 10:31:43

solution1
2 2016-04-28 10:27:45

solution2
2 ACCPTED 2016-04-28 10:29:13

solution3
2 2016-04-28 10:31:43