简体   繁体   English

如何优化用其他栅格区域平均值替换栅格 nan 值的 Python 循环

[英]How to optimize a Python loop that replaces raster nan values with other raster region mean values

I need to read 2 rasters, one a satellite image (target) and the other regions of this image (segmentation).我需要读取 2 个栅格,一个是卫星图像(目标),另一个是该图像的其他区域(分割)。 The target image present numeric values and nans.目标图像呈现数值和 nans。 The segmentation image are regions, in which each pixel with the same value are from the same region, eg, all pixels with value 1 are from segment 1.分割图像是区域,其中具有相同值的每个像素都来自同一区域,例如所有值为1的像素都来自segment 1。

Based on that, I would like to calculate the mean value for each segment that contains nans and replace the nan calues by the segment mean value.基于此,我想计算包含 nans 的每个段的平均值,并用段平均值替换 nan calues。 If I have a segment of 5 pixels and target image has the values (2,nan,4,4,2), the nan value must be replaced by 3.如果我有一个 5 像素的片段并且目标图像具有值 (2,nan,4,4,2),则 nan 值必须替换为 3。

I have writen a script that does this.我写了一个脚本来做到这一点。 However the code is very slow at the FOR loop when I am processing large images.但是,当我处理大图像时,FOR 循环中的代码非常慢。 Based on that, I would like to know how can the loop be improved in performance.基于此,我想知道如何提高循环的性能。

import numpy
import rasterio

### returns which pixels are nan
def get_gaps(img):
    gaps = numpy.argwhere( numpy.isnan( img ) )
    return( gaps )

def fill(img_targ, gaps_targ, img_seg1):
    ### Get which segments contains NA on target image
    indices_gap_targ = numpy.array( list( zip(gaps_targ[:,0], gaps_targ[:,1] ) ) )
    segments_targ = img_seg1[ indices_gap_targ[:,0], indices_gap_targ[:,1] ]
    segments_targ = numpy.unique( segments_targ[~numpy.isnan( segments_targ )] )

    for seg in segments_targ:
        ### Get seg pixel position
        seg_pixels = numpy.nonzero( img_seg1 == seg )
        seg_indices = numpy.array( list( zip(seg_pixels[:][0], seg_pixels[:][1] ) ) )
        ### Get targ pix values
        targ_values_seg = img_targ[ seg_indices[:,0], seg_indices[:,1] ]
        ### Check if any is not nan otherwise it will not have any value to use as mean
        if( numpy.any( ~numpy.isnan(targ_values_seg) ) ):
            ### Get nan position and replace by mean value
            nan_pos = numpy.isnan( targ_values_seg )
            img_targ[ seg_indices[:,0][nan_pos], seg_indices[:,1][nan_pos] ] = numpy.nanmean(targ_values_seg)
    return img_targ

input_targ_filename = "/home/path/target.tif"
input_seg1_filename = "/home/path/segmentation.tif"

with rasterio.open(input_targ_filename) as dataset:
    img_targ = dataset.read(1)
    img_targ[ img_targ < -100000 ] = numpy.nan
    kwargs = dataset.meta

with rasterio.open(input_seg1_filename) as dataset:
    img_seg1 = dataset.read(1)
    img_seg1[ img_seg1 < -100000 ] = numpy.nan

gaps_targ = get_gaps(img_targ)
img_filled = fill(img_targ, gaps_targ, img_seg1)

np.bincount is the tool of choice for this kind of problem. np.bincount是解决此类问题的首选工具。 (It does essentially the same as the more intuitive np.add.at but is typically way faster.) (它与更直观的np.add.at基本相同,但通常更快。)

import numpy as np

# create mock data (this takes longer than the actual processing)
print("creating example")
N = 1000
NS = 2000
tgt = np.random.randn(N,N)
tgt[np.random.random((N,N))<0.1] = np.nan
seg = np.zeros((N,N),int)
seg.ravel()[np.random.choice(N*N,NS,replace=False)] = np.arange(1,NS+1)
idcs = np.s_[1:],np.s_[:,1:],np.s_[:-1],np.s_[:,:-1]
while np.count_nonzero(seg) < N*N/2:
    i = np.random.randint(4)
    idx,cidx = idcs[i],idcs[i-2]
    seg[idx][seg[idx]==0] = seg[cidx][seg[idx]==0]

# replace nans (in-place, overwrites nans in tgt)
print("replacing nans")
n = np.isnan(tgt)
nn = ~n
segnn = seg[nn]
tgt[n] = (np.bincount(segnn,tgt[nn],NS+1)/np.bincount(segnn,None,NS+1))[seg[n]]

# check
print("verifying",end=" ... ")
sample = np.random.randint(0,NS+1,10)
for i in sample:
    assert np.allclose(tgt[n][seg[n]==i],np.mean(tgt[nn][seg[nn]==i]))
print("looks ok")

to answer your example, you can do it like this with no loops:回答你的例子,你可以这样做没有循环:

seg = np.array((2,np.nan,4,4,2))
seg[np.isnan(seg)] = np.nanmean(seg)

output: output:

array([2., 3., 4., 4., 2.])

I hope this principle helps you implement this into your larger code我希望这个原则可以帮助你在更大的代码中实现它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM