简体   繁体   中英

Reading excel with Python Pandas and isolating columns/rows to plot

I am using Python pandas read_excel to create a histogram or line plot. I would like to read in the entire file. It is a large file and I only want to plot certain values on it. I know how to use skiprows and parse_cols in read_excel, but if I do this, it does not read a part of the file that I need to use for the axis labels. I also do not know how to tell it to plot what I want for x-values and what I want for the y-values. Heres what I have:

df=pd.read_excel('JanRain.xlsx',parse_cols="C:BD")

years=df[0]
precip=df[31:32]
df.plot.bar()

I want the x axis to be row 1 of the excel file(years) and I want each bar in the bar graph to be the values on row 31 of the excel file. Im not sure how to isolate this. Would it be easier to read with pandas then plot with matplotlib?

Here is a sample of the excel file. The first row is years and the second column is days of the month (this file is only for 1 month:

这是excel文件的示例。第一行是年份,第二列是每月的几天(此文件仅适用于1个月

Here's how I would plot the data in row 31 of a large dataframe, setting row 0 as the x-axis. (updated answer)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

create a random array with 32 rows, and 10 columns

df = pd.DataFrame(np.random.rand(320).reshape(32,10), columns=range(64,74), index=range(1,33))
df.to_excel(r"D:\data\data.xlsx")

Read only the columns and rows that you want using "parse_cols" and "skiprows." The first column in this example is the dataframe index.

# load desired columns and rows into a dataframe
# in this method, I firse make a list of all skipped_rows
desired_cols = [0] + list(range(2,9))
skipped_rows = list(range(1,33))
skipped_rows.remove(31)
df = pd.read_excel(r"D:\data\data.xlsx", index_col=0, parse_cols=desired_cols, skiprows=skipped_rows)

Currently this yields a dataframe with only one row.

  65 66 67 68 69 70 71 31 0.310933 0.606858 0.12442 0.988441 0.821966 0.213625 0.254897 

isolate only the row that you want to plot, giving a pandas.Series with the original column header as the index

ser = df.loc[31, :]

Plot the series.

fig, ax = plt.subplots()
ser.plot(ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

在此处输入图片说明

fig, ax = plt.subplots()
ser.plot(kind="bar", ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM