Reading excel with Python Pandas and isolating columns/rows to plot

Question

I am using Python pandas read_excel to create a histogram or line plot. I would like to read in the entire file. It is a large file and I only want to plot certain values on it. I know how to use skiprows and parse_cols in read_excel, but if I do this, it does not read a part of the file that I need to use for the axis labels. I also do not know how to tell it to plot what I want for x-values and what I want for the y-values. Heres what I have:

df=pd.read_excel('JanRain.xlsx',parse_cols="C:BD")

years=df[0]
precip=df[31:32]
df.plot.bar()

I want the x axis to be row 1 of the excel file(years) and I want each bar in the bar graph to be the values on row 31 of the excel file. Im not sure how to isolate this. Would it be easier to read with pandas then plot with matplotlib?

Here is a sample of the excel file. The first row is years and the second column is days of the month (this file is only for 1 month:

Answer 1

Here's how I would plot the data in row 31 of a large dataframe, setting row 0 as the x-axis. (updated answer)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

create a random array with 32 rows, and 10 columns

df = pd.DataFrame(np.random.rand(320).reshape(32,10), columns=range(64,74), index=range(1,33))
df.to_excel(r"D:\data\data.xlsx")

Read only the columns and rows that you want using "parse_cols" and "skiprows." The first column in this example is the dataframe index.

# load desired columns and rows into a dataframe
# in this method, I firse make a list of all skipped_rows
desired_cols = [0] + list(range(2,9))
skipped_rows = list(range(1,33))
skipped_rows.remove(31)
df = pd.read_excel(r"D:\data\data.xlsx", index_col=0, parse_cols=desired_cols, skiprows=skipped_rows)

Currently this yields a dataframe with only one row.

  65 66 67 68 69 70 71 31 0.310933 0.606858 0.12442 0.988441 0.821966 0.213625 0.254897

isolate only the row that you want to plot, giving a pandas.Series with the original column header as the index

ser = df.loc[31, :]

Plot the series.

fig, ax = plt.subplots()
ser.plot(ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

fig, ax = plt.subplots()
ser.plot(kind="bar", ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

Reading excel with Python Pandas and isolating columns/rows to plot

Question

1 answers

solution1
3 ACCPTED 2017-10-17 16:18:57

Reading excel with Python Pandas and isolating columns/rows to plot

Question

1 answers

solution1 3 ACCPTED 2017-10-17 16:18:57

solution1
3 ACCPTED 2017-10-17 16:18:57