繁体   English   中英

numpy.histogram:Excel中的空单元格问题

[英]numpy.histogram: problems with empty cells in excel

我是使用python的新手,所以我不知道所有技术术语是否正确。

我正在使用xlrd从excel工作表中读取数据,然后使用过滤器函数对其进行过滤,然后使用numpy.histogram函数创建直方图。 现在我在excel工作表中有一个空单元格,并且numpy.histogram返回错误的结果:

这是我的代码:

import xlrd
import openpyxl
import numpy as n
from numpy import *   

file_location = "C:/Users/test.xlsx"
sheet_index = 2
range_hist = 23
lifetime_data = 3
low_salesyear = 1990
upp_salesyear = 2005
col_filter1 = 14
filter_value1 = 1
col_filter2 = 18
filter_value2 = 5


    # open excel-file
    workbook = xlrd.open_workbook(file_location)


    # get sheet, index always start at 0
    sheet = workbook.sheet_by_index(sheet_index)


    #read all data in the sheet
    list_device = [[sheet.cell_value(r,c) for c in range (sheet.ncols)] for r in range (1,sheet.nrows)]


    # filter list for independent variables
    listnew = list(filter(lambda x:  x[col_filter1]==filter_value1 and x[col_filter2]==filter_value2 and low_salesyear <= x[0] <= upp_salesyear, list_device))
    # low_salesyear <= x[0] <= upp_salesyear and


    # select relevant data from filtered list for histogram and store it in list for histogram

    list_for_hist = []
    for i in range(len(listnew)):
        list_for_hist.append(listnew[i][lifetime_data])
    print (list_for_hist)


    # create array from list
    array_for_hist = array(list_for_hist)


    # create histogram
    hist = np.histogram(array_for_hist, bins = range(0,int(range_hist)))
    print (hist)

我将所有变量放在开头,以便可以轻松更改它们。 我敢肯定会有一种更优雅的方式来编写整个程序。

我从excel筛选的列表如下所示:

[8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0, 8.0,
6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0, 19.0, 8.0, 6.0]

从numpy.histogram生成的历史记录看起来像这样:

(array([  0,  10,   0,   1,   3,   1,   3,   2,   5, -25,   1,   1,   1,
         3,   0,   0,   1,   1,   0,   2,   0,   0]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22]))

因此,我不明白为什么它会为bin 1返回10,为bin 9给出-25。如果我在excel中消除了空白单元格,则直方图将正确。

有没有办法告诉我的程序忽略空单元格?

非常感谢你的帮助!

np.array(list_for_hist)np.array(list_for_hist)所有项目转换为通用list_for_hist list_for_hist包含浮点数和字符串时, np.array返回包含所有字符串的数组:

In [32]: np.array(list_for_hist)
Out[32]: 
array(['8.0', '19.0', '4.0', '4.0', '8.0', '3.0', '13.0', '', '10.0',
       '7.0', '17.0', '16.0', '8.0', '6.0', '13.0', '8.0', '7.0', '11.0',
       '12.0', '13.0', '4.0', '6.0', '5.0', '19.0', '8.0', '6.0'], 
      dtype='|S32')   <-- `|S32` means 32-byte strings.

因此,使用bins=range(0,int(23))字符串进行装箱可能会引发异常,但np.histogram返回垃圾。

您需要将list_for_hist转换为仅包含浮点数的数组或列表:

import numpy as np
list_for_hist = [8.0, 19.0, 4.0, 4.0, 8.0, 3.0, 13.0, '', 10.0, 7.0, 17.0, 16.0,
                 8.0, 6.0, 13.0, 8.0, 7.0, 11.0, 12.0, 13.0, 4.0, 6.0, 5.0,
                 19.0, 8.0, 6.0]

array_for_hist = np.array(
    [item if isinstance(item,(float,int)) else np.nan for item in list_for_hist])
# create histogram
hist, bin_edges = np.histogram(array_for_hist, bins=range(0,int(23)))
print (hist)

产量

[0 0 0 1 3 1 3 2 5 0 1 1 1 3 0 0 1 1 0 2 0 0]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM