[英]Nan values in columns in python
I have a data set which is created based on other data set. 我有一个基于其他数据集创建的数据集。 In my new data fame some columns have nan values.
在我的新数据中,一些列具有nan值。 I want to make a log on each columns.
我想在每列上做一个日志。 However I need all the rows even though they have Nan values.
但是,即使它们具有Nan值,我也需要所有行。 What should I do with Nan values before applying log?
应用日志之前,我应该对Nan值做什么? For example consider the following data set:
例如,考虑以下数据集:
a b c
1 2 3
4 5 6
7 nan 8
9 nan nan
I do not want to drop the rows with nan values. 我不想删除带有nan值的行。 I need them for applying log on them.
我需要它们来应用登录它们。
I need to have the values of 7 and 8 in the row 6 for example. 例如,我需要在第6行中使用值7和8。 Thanks.
谢谢。
Having nan
won't affect log when calculating for each individual cell. 为每个单元格计算时,具有
nan
不会影响日志。 What's more is that np.log
has the property that it will operate on a pd.DataFrame
and return a pd.DataFrame
更重要的是,
np.log
具有将在pd.DataFrame
上pd.DataFrame
并返回pd.DataFrame
np.log(df)
a b c
0 0.000000 0.693147 1.098612
1 1.386294 1.609438 1.791759
2 1.945910 NaN 2.079442
3 2.197225 NaN NaN
Notice the difference in timing 注意时间上的差异
%timeit np.log(df)
%timeit pd.DataFrame(np.log(df.values), df.index, df.columns)
%timeit df.applymap(np.log)
134 µs ± 5.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
107 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
835 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Response to @IanS 对@IanS的回应
Notice the subok=True
parameter in the documentation 注意文档中的
subok=True
参数
It controls whether the original type is preserved. 它控制是否保留原始类型。 If we turn it to
False
如果我们将其设为
False
np.log(df, subok=False)
array([[ 0. , 0.69314718, 1.09861229],
[ 1.38629436, 1.60943791, 1.79175947],
[ 1.94591015, nan, 2.07944154],
[ 2.19722458, nan, nan]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.