简体   繁体   English

用np.NaN替换熊猫数据框中的缺失值(以字符串形式给出)

[英]Replace missing values (given as strings) in pandas dataframe by np.NaN

I have a dataframe energy with missing values in some column. 我的数据框energy有些列中缺少值。 The missing values are represented by a string ... in the dataframe. 缺少的值在数据帧中由字符串...表示。 I want to replace all these values by np.NaN 我想将所有这些值替换为np.NaN

In [3]: import pandas as pd

In [4]: import numpy as np

In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
   ...: , 'ESC', '% Renewable'])

In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]: 
                          Country   ES  ESC  % Renewable
3                  American Samoa  ...  ...     0.641026
86                           Guam  ...  ...     0.000000
150      Northern Mariana Islands  ...  ...     0.000000
210                        Tuvalu  ...  ...     0.000000
217  United States Virgin Islands  ...  ...     0.000000

To replace these values, I tried: 为了替换这些值,我尝试:

In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

I don't understand the error and also I don't see any other way to achieve what I want to. 我不了解该错误,也看不到任何其他方式可以实现我想要的目标。 Any ideas? 有任何想法吗?

I think you need: 我认为您需要:

energy['ES'] = energy.loc[energy['ES'] != "...", 'ES'] 

Another solution: 另一个解决方案:

energy['ES'] = energy['ES'].mask(energy['ES'] == "...")

Or: 要么:

energy['ES'] = energy['ES'].replace({'...': np.nan})

But the best is ayhan comment: 但是最好的是ayhan评论:

you can pass na_values='...' to pd.read_excel 您可以将na_values ='...'传递给pd.read_excel

If Energy is your pandas dataframe then in your case you can also try: 如果Energy是您的熊猫数据框,那么在您的情况下,您也可以尝试:

for col in Energy.columns:
    Energy[col] = pd.to_numeric(Energy[col], errors = 'coerce')

Above code will convert all your missing values to nan automatically for all columns in your dataframe. 上面的代码将自动为数据框中的所有列将所有缺少的值转换为nan。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM