简体   繁体   English

Excel 文件和 DataFrame 没有可绘制错误的数字数据

[英]Excel file and DataFrame No numeric data to plot error

I'm trying to plot data read into Pandas from a xlsx file.我正在尝试绘制从 xlsx 文件读入 Pandas 的数据。 After some minor formatting and data quality checks, I try to plot using matplotlib but get the following error:经过一些小的格式和数据质量检查后,我尝试使用 matplotlib 进行绘图,但出现以下错误:

TypeError: Empty 'DataFrame': no numeric data to plot

This is not a new issue and I have followed many of the pages on this site dealing with this very problem.这不是一个新问题,我已经关注了本网站上处理这个问题的许多页面。 The posted suggestions, unfortunately, have not worked for me.不幸的是,发布的建议对我不起作用。

My data set includes strings (locations of sampling sites and limited to the first column), dates (which I have converted to the correct format using pd.to_datetime ), many NaN entries (that cannot be converted to zeros due to the graphical analysis we are doing), and column headings representing various analytical parameters.我的数据集包括字符串(采样点的位置,仅限于第一列)、日期(我已使用pd.to_datetime将其转换为正确的格式)、许多NaN条目(由于我们的图形分析,无法转换为零)正在做),以及代表各种分析参数的列标题。

As per some of the suggestions I read on this site, I have tried the following code根据我在本网站上阅读的一些建议,我尝试了以下代码

  1. df = df.astype(float) which gives me the following error ValueError: could not convert string to float: 'Site 1' (Site 1 is a sampling location) df = df.astype(float)这给了我以下错误ValueError: could not convert string to float: 'Site 1' (Site 1 is a sampling location)

  2. df = df.apply(pd.to_numeric, errors='ignore') which gives me the following: dtypes: float64(13), int64(1), object(65) and therefore does not appear to work as most of the data remains as an object. df = df.apply(pd.to_numeric, errors='ignore')这给了我以下内容: dtypes: float64(13), int64(1), object(65)因此似乎不像大多数数据那样工作仍然是一个对象。 The date entries are the int64 and I cannot figure out why some of the data columns are float64 and some remain as objects日期条目是 int64,我无法弄清楚为什么有些数据列是 float64 而有些则保留为对象

  3. df = df.apply(pd.to_numeric, errors='coerce') which deletes the entire DataFrame, possibly because this operation fills the entire DataFrame with NaN ? df = df.apply(pd.to_numeric, errors='coerce')删除整个数据帧,可能是因为这个操作用NaN填充了整个数据帧?

I'm stuck and would appreciate any insight.我被困住了,希望得到任何见解。

EDIT编辑

I was able to solve my own question based on some of the feedback.我能够根据一些反馈解决我自己的问题。 Here is what worked for me:这是对我有用的:

df = "path"

header = [0]    # keep column headings as first row of original data
skip = [1]      # skip second row, which has units of measure
na_val = ['.','-.','-+0.01']    # Convert spurious decimal points that have 
                                # no number associated with them to NaN
convert = {col: float for col in (4,...,80)}   # Convert specific rows to 
                                               # float from original text
parse_col = ("A","C","E:CC")    # apply to specific columns 

df = pd.read_excel(df, header = header, skiprows = skip, 
na_values = na_val, converters = convert, parse_columns = parse_col)

Hard to answer without a data sample, but if you are sure that the numeric columns are 100% numeric, this will probably work:没有数据样本很难回答,但如果您确定数字列是 100% 数字,这可能会起作用:

for c in df.columns:
try:
    df[c] = df[c].astype(int)
except:
    pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM