简体   繁体   English

python如何处理缺失数据

[英]How to deal with missing data by python

I have a set of data stored in txt file as follows:我有一组数据存储在txt文件中,如下所示:

Elevation(ROAD)       Interval    
1.3                    1
3.3                    2
4.1                    3
-1.5                   4
NA                     5
NA                     6
6.8                    7
2.1                    8
5.1                    9
NA                     10
6.1                    11
NA                     12
NA                     13
NA                     14

is there any method to interpolate these missing data (NA) using python?有什么方法可以使用 python 插入这些缺失数据(NA)? by example using averaging technique例如使用平均技术

You don't provide much detail.你没有提供太多细节。 You don't either show code.你也不显示代码。

One simple way to get what you want is to create a pandas.Series() to which you apply the interpolate function (google for it if you need specific interpolation settings; they may be slightly different depending on the pandas version you are using).获得所需内容的一种简单方法是创建一个pandas.Series()并对其应用interpolate function (如果您需要特定的插值设置,请使用谷歌搜索;根据您使用的pandas版本,它们可能略有不同)。FA922

(My understanding is that your Interval column is a simple dataframe index). (我的理解是您的Interval列是一个简单的 dataframe 索引)。

import pandas as pd
import numpy as np
data = [1.3, 3.3, 4.1 -1.5, np.nan , np.nan , 6.8, 2.1, 5.1, np.nan, 6.1, np.nan , np.nan , np.nan]
ser = pd.Series(data)
ser.interpolate()

Assuming your pandas data frame as df假设您的pandas数据帧为df

df['Elevation'].fillna((df['Elevation'].mean()), inplace=True)

Try this out!试试这个!

If, for any case, you can't use external libraries:如果在任何情况下都不能使用外部库:

file_content = """1.3
3.3
4.1
-1.5
NA
NA
6.8
2.1
5.1
NA
6.1
NA
NA
NA
7.1
NA"""

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

class ParsedList:
  def __init__(self):
    self.list = []
    self.holes = {} # index key, value length

  def set_value(self, number):
    if isfloat(number):
      self.list.append(float(number))
    else:
      key = len(self.list)-1
      if key in self.holes:
        self.holes[key] += 1
      else:
        self.holes[key] = 1

  def interpolate(self):
    output = list(self.list)
    offset=0

    for index, size in self.holes.items():
      if index < len(self.list)-1:
        delta = (self.list[index+1] - self.list[index])/(size+1)
        init_value = self.list[index]
      else:
        delta =0
        init_value = self.list[-1]
      for i in range(size):
        output.insert(index+i+1+offset, init_value+delta*(i+1))
      offset+=size
    return output

# test:
parsed_list = ParsedList() 
for x in file_content.splitlines():
  parsed_list.set_value(x)

[print(x) for x in parsed_list.interpolate()]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM