简体   繁体   中英

Suppress scientific notation for large numbers in pandas data frame

I'm trying to read in a csv file and create a horizontal bar plot with the values being labels at the end of each bar. Similar to this plot:

在此处输入图像描述

I got everything to work except the values keep being represented in scientific notation. I have tried the examples here but nothing changes and I don't get any errors. I don't understand what I am doing wrong. The data type of the column where the values are in is float64

在此处输入图像描述

Sample of data:

 {'Year': {9799: 2020.0,
  5179: 2020.0,
  7489: 2020.0,
  27619: 2020.0,
  15959: 2020.0,
  23109: 2020.0,
  15299: 2020.0,
  17609: 2020.0,
  16619: 2020.0,
  3529: 2020.0},
 'visitors': {9799: 4068529.0,
  5179: 4083505.0,
  7489: 4888436.0,
  27619: 6124808.0,
  15959: 6237361.0,
  23109: 8016510.0,
  15299: 8404728.0,
  17609: 12095720.0,
  16619: 12400045.0,
  3529: 14099485.0},
 'Park': {9799: 'Delaware Water Gap NRA',
  5179: 'Cape Cod NS',
  7489: 'Chesapeake & Ohio Canal NHP',
  27619: 'Natchez Trace PKWY',
  15959: 'George Washington MEM PKWY',
  23109: 'Lake Mead NRA',
  15299: 'Gateway NRA',
  17609: 'Great Smoky Mountains NP',
  16619: 'Golden Gate NRA',
  3529: 'Blue Ridge PKWY'}}

Sample Code:

import pandas as pd
import matplotlib.pyplot as plt
# Filter for 2020
df = df[df['Year'] == 2020]
# Select only columns needed
df = df[['Year','visitors','Park']]
# Find top 10 most visited parks in 2020
df = df.nlargest(10,'visitors')
# Make dataframe in descending order
df = df.sort_values('visitors',ascending=True)

#pd.options.display.float_format = '{:.1f}'.format
#pd.set_option('display.float_format', lambda x: '%.3f' % x)
# Make bar plot showing top 10 parks in 2020
plot1=df_2020.plot.barh('Park', 'visitors',color = 'green')
plt.title("America's Most Visited Parks\nAnnual Visitation (2020)")
plt.ylabel('')
plt.xlabel('Visitation')
plt.xticks([],[])
plt.bar_label(plot1.containers[0])
plt.style.use('ggplot')
plt.show()

One option is to use the fmt parameter and pass a format string in the desired representation.

plt.bar_label(plot1.containers[0], fmt='%.f')

Another option would be to divide the number of visitors by 1 million and indicate this on the chart label (or in the legend, if more appropriate).

...
df_2020.visitors /= 1e6 # divide by 1 million
plot1=df_2020.plot.barh('Park', 'visitors',color = 'green')
plt.title("America's Most Visited Parks\nAnnual Visitation (2020)")
plt.ylabel('')
plt.xlabel(r'Visitation ( $\bf{in\ millions}$ )') # cite here
...

公园游客

One simple way to do it is to replace in your code:

plt.bar_label(plot1.containers[0])

with

plt.bar_label(plot1.containers[0], labels=df["visitors"].astype(int))
plt.margins(0.7)

So that the plot looks like this:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM