简体   繁体   English

将未知整数舍入到最高 base10 值 Python

[英]Round unknown whole number to highest base10 value Python

Values will be given as a max() from a pandas data frame.值将作为来自 pandas 数据帧的 max() 给出。 For each item, I would like to get a rounded max value to create y-ticks for a matplot plot with the number of ticks = 10.对于每个项目,我想获得一个四舍五入的最大值来为 matplot plot 创建 y 刻度,刻度数 = 10。

The data frame I am using is the official John Hopkins Covid Data.我使用的数据框是官方的 John Hopkins Covid Data。 The preceding code returns the data frames categorized by Countries or States, Daily totals or cumulative, cases or deaths.前面的代码返回按国家或州、每日总数或累计、病例或死亡分类的数据框。

I have written code in the for loop that will convert the max, which could be over 20 million or as low as 6, to get the leading digit and add 1, then concatenate extra zero's if needed.我在 for 循环中编写了代码,该代码将转换最大值(可能超过 2000 万或低至 6)以获取前导数字并加 1,然后在需要时连接额外的零。 I would rather have a value rounded down if the next digit is small, as this code creates small gaps at the top of some charts.如果下一个数字很小,我宁愿将值向下舍入,因为此代码会在某些图表的顶部产生小间隙。

is the code I wrote that converts back and forth between str and int pythonic?我写的代码是在str和int pythonic之间来回转换的吗? Is there a simple way to add a round method to that code?有没有一种简单的方法可以向该代码添加一个圆形方法? or Is there just a better, more efficient way to do what I'm trying to do?或者是否有更好、更有效的方法来做我想做的事情?

# Per Capita ## (identical version for daily totals on dfs1)
cumulative2 = dfs2.T[default[ind]]
daily_cases2 = cumulative2.diff()
d_max2 = daily_cases2.max().max()
c_max2 = cumulative2.max().max()

... ...

plot1 = daily_cases1.plot(kind='area', stacked=False, ax=ax1, lw=2, ylim=(0, d_max1))
plot2 = daily_cases2.plot(kind='area', stacked=False, ax=ax2, lw=2, ylim=(0, d_max2))
plot3 = cumulative1.plot(kind='area', stacked=False, ax=ax3, lw=2, ylim=(0, c_max1))
plot4 = cumulative2.plot(kind='area', stacked=False, ax=ax4, lw=2, ylim=(0, c_max2))

plots = [plot1, plot2, plot3, plot4]
maxes = [d_max1, d_max2, c_max1, c_max2]
for i, plot in enumerate(plots):
    rnd_max = int(f'{str(int(str(int(maxes[i]))[0]) + 1) + "0" * (len(str(int(maxes[i]))) - 1)}')
    yticks = np.arange(0, rnd_max, 1 if rnd_max < 10 else rnd_max // 10)
    ytick_labels = pd.Series(yticks).apply(lambda value: f"{int(value):,}")
    plot.set_yticks(yticks)
    plot.set_yticklabels(ytick_labels)

EDIT: The leading value I would like to be 3 if the value is 2,750,00 or 4 if the value is 41. So not a true base 10 return.编辑:如果值为 2,750,00,我希望为 3,如果值为 41,则为 4。所以不是真正的以 10 为底的回报。 but base 10 of with the leading digit.但以 10 为底,以前导数字为基数。

cumulative:累积:

State    California  Arizona  Florida  New York    Texas  Illinois
11/4/20      950920   250633   821123    519890  1003342    443803
3/14/20         372       12       76       557       60        64
5/22/20       90281    15624    49451    360818    53817    105444

daily:日常的:

State    California  Arizona  Florida  New York    Texas  Illinois
4/3/20       1226.0    173.0   1260.0   10675.0    771.0    1209.0
6/25/20      5088.0   3091.0   5004.0     814.0   5787.0     894.0
11/3/20      4990.0   1679.0   4637.0    2069.0   9721.0    6516.0

c_max and d_ max are just lists of floats/ints (equal to max value of pd series being plotted) 63817.0 c_max 和 d_max 只是浮点数/整数列表(等于正在绘制的 pd 系列的最大值)63817.0

2675262 2675262

Here's an output of a series of plots.这是一系列情节的output。 You can see the first graph ticks go much higher than the actual max value of the first chart (ignore plot placement it's on the best fit for now).您可以看到第一个图表的刻度 go 远高于第一个图表的实际最大值(忽略 plot 的位置,它现在是最合适的)。 This is the result of rounding a low number high which I would like to alleviate.这是我想减轻的将低数字四舍五入的结果。 But the goal is to give the cleanest tick value I can while keeping the plots nice and tight但我们的目标是提供最干净的刻度值,同时保持情节的美观和紧凑

一系列地块中的 1 个

If you really want just one significant digit for your 10 steps, you can replicate your (no, not really Pythonic I would say) string-converting expression with something that uses the base-10 logarithm, eg如果你真的只想要你的 10 步的一个有效数字,你可以用使用以 10 为底的对数的东西来复制你的(不,不是真正的 Pythonic)字符串转换表达式,例如

def round10(n):
  return 10**math.ceil(math.log10(n))

But as you have yourself noticed, this doesn't really produce useful results, for example if the maximum value is 1001, the y ticks would go from 0 to 10000, meaning everything would basically be squeezed to the nearest tick.但是正如您自己注意到的那样,这并不会真正产生有用的结果,例如,如果最大值为 1001,则 y 刻度将 go 从 0 到 10000,这意味着所有内容基本上都会被压缩到最近的刻度。 The built in autoscaling is more sophisticated and maximizes the usable area.内置的自动缩放功能更加复杂,并最大限度地增加了可用区域。

from math import floor, log
def round_first(x):
    p = 10**floor(log(x,10))
    return (round(x/p)*p)
>>> round_first(5123)
5000
>>> round_first(5987)
6000
>>>

Edit: If you care about performance, then put all your data in as a numpy arrays and do a vectorized approach.编辑:如果您关心性能,则将所有数据作为 numpy arrays 并执行矢量化方法。 The code below is vectorized and also doesn't choke on zero or negative numbers.下面的代码是矢量化的,也不会因零或负数而窒息。

import numpy as np
>>> def round_first(x):                                 
...     xa = np.abs(x)                                  
...     xs = np.sign(x)                                 
...     nonzero = x!=0                                  
...     p=10**np.floor(np.log10(xa[nonzero]))           
...     out=np.zeros(x.shape)
...     out[nonzero] = np.round(xa[nonzero]/p)*p*xs[nonzero]
...     return out                                      
...
>>> x = np.arange(-1000,2001,67)                        
>>> x
array([-1000,  -933,  -866,  -799,  -732,  -665,  -598,  -531,  -464,
        -397,  -330,  -263,  -196,  -129,   -62,     5,    72,   139,
         206,   273,   340,   407,   474,   541,   608,   675,   742,
         809,   876,   943,  1010,  1077,  1144,  1211,  1278,  1345,
        1412,  1479,  1546,  1613,  1680,  1747,  1814,  1881,  1948])
>>> round_first(x)
array([-1000.,  -900.,  -900.,  -800.,  -700.,  -700.,  -600.,  -500.,
        -500.,  -400.,  -300.,  -300.,  -200.,  -100.,   -60.,     5.,
          70.,   100.,   200.,   300.,   300.,   400.,   500.,   500.,
         600.,   700.,   700.,   800.,   900.,   900.,  1000.,  1000.,
        1000.,  1000.,  1000.,  1000.,  1000.,  1000.,  2000.,  2000.,
        2000.,  2000.,  2000.,  2000.,  2000.])

Also your question says round nearest (you say 41 becomes 40 instead of 50), but your self answer to yourself uses a ceil(), which would make 41 go to 50.此外,您的问题是最接近的(您说 41 变为 40 而不是 50),但您对自己的自我回答使用 ceil(),这将使 41 go 变为 50。

def round10_first(x):
    from math import floor, ceil, log
    p = 10 ** floor(log(x, 10))
    return ceil(x / p) * p

Thank you guys for your help.谢谢你们的帮助。 I actually combined your answers for my solution I ran a timeit on them and they are both the same speed but I will use the one built from yours to be more pythonic实际上,我将您的答案结合起来作为我的解决方案我在它们上运行了一个 timeit,它们的速度相同,但我将使用您构建的那个更 Pythonic

%timeit -n 10000000 function1
%timeit -n 10000000 function2

16.7 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
16.8 ns ± 0.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM