简体   繁体   中英

Round unknown whole number to highest base10 value Python

Values will be given as a max() from a pandas data frame. For each item, I would like to get a rounded max value to create y-ticks for a matplot plot with the number of ticks = 10.

The data frame I am using is the official John Hopkins Covid Data. The preceding code returns the data frames categorized by Countries or States, Daily totals or cumulative, cases or deaths.

I have written code in the for loop that will convert the max, which could be over 20 million or as low as 6, to get the leading digit and add 1, then concatenate extra zero's if needed. I would rather have a value rounded down if the next digit is small, as this code creates small gaps at the top of some charts.

is the code I wrote that converts back and forth between str and int pythonic? Is there a simple way to add a round method to that code? or Is there just a better, more efficient way to do what I'm trying to do?

# Per Capita ## (identical version for daily totals on dfs1)
cumulative2 = dfs2.T[default[ind]]
daily_cases2 = cumulative2.diff()
d_max2 = daily_cases2.max().max()
c_max2 = cumulative2.max().max()

...

plot1 = daily_cases1.plot(kind='area', stacked=False, ax=ax1, lw=2, ylim=(0, d_max1))
plot2 = daily_cases2.plot(kind='area', stacked=False, ax=ax2, lw=2, ylim=(0, d_max2))
plot3 = cumulative1.plot(kind='area', stacked=False, ax=ax3, lw=2, ylim=(0, c_max1))
plot4 = cumulative2.plot(kind='area', stacked=False, ax=ax4, lw=2, ylim=(0, c_max2))

plots = [plot1, plot2, plot3, plot4]
maxes = [d_max1, d_max2, c_max1, c_max2]
for i, plot in enumerate(plots):
    rnd_max = int(f'{str(int(str(int(maxes[i]))[0]) + 1) + "0" * (len(str(int(maxes[i]))) - 1)}')
    yticks = np.arange(0, rnd_max, 1 if rnd_max < 10 else rnd_max // 10)
    ytick_labels = pd.Series(yticks).apply(lambda value: f"{int(value):,}")
    plot.set_yticks(yticks)
    plot.set_yticklabels(ytick_labels)

EDIT: The leading value I would like to be 3 if the value is 2,750,00 or 4 if the value is 41. So not a true base 10 return. but base 10 of with the leading digit.

cumulative:

State    California  Arizona  Florida  New York    Texas  Illinois
11/4/20      950920   250633   821123    519890  1003342    443803
3/14/20         372       12       76       557       60        64
5/22/20       90281    15624    49451    360818    53817    105444

daily:

State    California  Arizona  Florida  New York    Texas  Illinois
4/3/20       1226.0    173.0   1260.0   10675.0    771.0    1209.0
6/25/20      5088.0   3091.0   5004.0     814.0   5787.0     894.0
11/3/20      4990.0   1679.0   4637.0    2069.0   9721.0    6516.0

c_max and d_ max are just lists of floats/ints (equal to max value of pd series being plotted) 63817.0

2675262

Here's an output of a series of plots. You can see the first graph ticks go much higher than the actual max value of the first chart (ignore plot placement it's on the best fit for now). This is the result of rounding a low number high which I would like to alleviate. But the goal is to give the cleanest tick value I can while keeping the plots nice and tight

一系列地块中的 1 个

If you really want just one significant digit for your 10 steps, you can replicate your (no, not really Pythonic I would say) string-converting expression with something that uses the base-10 logarithm, eg

def round10(n):
  return 10**math.ceil(math.log10(n))

But as you have yourself noticed, this doesn't really produce useful results, for example if the maximum value is 1001, the y ticks would go from 0 to 10000, meaning everything would basically be squeezed to the nearest tick. The built in autoscaling is more sophisticated and maximizes the usable area.

from math import floor, log
def round_first(x):
    p = 10**floor(log(x,10))
    return (round(x/p)*p)
>>> round_first(5123)
5000
>>> round_first(5987)
6000
>>>

Edit: If you care about performance, then put all your data in as a numpy arrays and do a vectorized approach. The code below is vectorized and also doesn't choke on zero or negative numbers.

import numpy as np
>>> def round_first(x):                                 
...     xa = np.abs(x)                                  
...     xs = np.sign(x)                                 
...     nonzero = x!=0                                  
...     p=10**np.floor(np.log10(xa[nonzero]))           
...     out=np.zeros(x.shape)
...     out[nonzero] = np.round(xa[nonzero]/p)*p*xs[nonzero]
...     return out                                      
...
>>> x = np.arange(-1000,2001,67)                        
>>> x
array([-1000,  -933,  -866,  -799,  -732,  -665,  -598,  -531,  -464,
        -397,  -330,  -263,  -196,  -129,   -62,     5,    72,   139,
         206,   273,   340,   407,   474,   541,   608,   675,   742,
         809,   876,   943,  1010,  1077,  1144,  1211,  1278,  1345,
        1412,  1479,  1546,  1613,  1680,  1747,  1814,  1881,  1948])
>>> round_first(x)
array([-1000.,  -900.,  -900.,  -800.,  -700.,  -700.,  -600.,  -500.,
        -500.,  -400.,  -300.,  -300.,  -200.,  -100.,   -60.,     5.,
          70.,   100.,   200.,   300.,   300.,   400.,   500.,   500.,
         600.,   700.,   700.,   800.,   900.,   900.,  1000.,  1000.,
        1000.,  1000.,  1000.,  1000.,  1000.,  1000.,  2000.,  2000.,
        2000.,  2000.,  2000.,  2000.,  2000.])

Also your question says round nearest (you say 41 becomes 40 instead of 50), but your self answer to yourself uses a ceil(), which would make 41 go to 50.

def round10_first(x):
    from math import floor, ceil, log
    p = 10 ** floor(log(x, 10))
    return ceil(x / p) * p

Thank you guys for your help. I actually combined your answers for my solution I ran a timeit on them and they are both the same speed but I will use the one built from yours to be more pythonic

%timeit -n 10000000 function1
%timeit -n 10000000 function2

16.7 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
16.8 ns ± 0.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM