简体   繁体   中英

Making a pandas dataFrame based on some column values of another dataFrame

I have a pandas DataFrame df1 with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12             

I would like to make a DataFrame that is based on df1 but that has any row containing an empty value removed. For example:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            15
   C              12            11
   C              12            13  

I tried something like this

df1=df[~np.isnan(df["year"]) or ~np.isnan(df["current"])]

But I received the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What could be the problem?

Please try with bitwise operator | instead, like this:

df1=df[ (~np.isnan(df["year"])) | (~np.isnan(df["current"]))]

Using dropna() , as suggested by EdChum, is likely the cleanest and neatest solution here. You can read more about this or working with missing data generally here

You can just call dropna to achieve this:

df1 = df.dropna()

As to why what you tried failed or operator doesn't understand what it should do when comparing array like structures as it is ambiguous if 1 or more elements meet the boolean criteria, you should use the bitwise operators & , | and ~ for and , or and not repsectively. Additionally for multiple conditions you need to wrap the conditions in parentheses due to operator precedence.

In [4]:
df.dropna()

Out[4]:
  Serial N  year  current
0        B    10       14
1        B    10       16
2        B    11       10
4        B    11       15
5        C    12       11
7        C    12       13

if you really have empty cells instead of NaN's:

In [122]: df
Out[122]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
3        B  11.0
4        B  11.0    15.0
5        C  12.0    11.0
6        C           9.0
7        C  12.0    13.0
8        C  12.0

In [123]: a.replace('', np.nan).dropna()
Out[123]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
4        B  11.0    15.0
5        C  12.0    11.0
7        C  12.0    13.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM