简体   繁体   English

如何在Python中求和两个数组?

[英]How to sum two arrays in Python?

I made a DataFrame like this: 我做了一个这样的DataFrame:

import numpy as np
import pandas as pd 

occurrence = np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])
disaster = {"occur":pd.Series(occur),"year":pd.Series(year)}
df =  pd.DataFrame(disaster)

Now I want to make a function so that, when I give two years, it will give me the sum of occurrences of those two years. 现在我想做一个函数,这样,当我给两年时,它会给我这两年出现的总和。 If I put 1851 and 1852 , it will show me the occurrence is 9 . 如果我把18511852 ,它会告诉我发生的是9

I wrote a function like this, but it shows error: 我写了这样的函数,但它显示错误:

def dist(s1,s2):
    return (sum (year>=s1 and year< s2))

print dist(s1,s2)
print(df.loc[df['year'].isin((1851,1852))]["occur"].sum())

Or: 要么:

 print(df.loc[df.year.isin((1851,1852))].occur.sum())

For a range of dates creating a list of ranges seems more efficient than using & : 对于一系列日期,创建范围列表似乎比使用&更有效:

df.loc[df.year.isin(range(s1, s2+1))].occur.sum()

If you're specifically wanting only a numpy approach, you'd do something similar to this: 如果你只是想要一个numpy方法,你会做类似的事情:

import numpy as np

occurrence= np.array([4, 5, 4, 0, 1, 4, 3])
year = np.array([1851,1852,1853,1854,1855,1856,1857])

year1, year2 = 1851, 1852
mask = (year == year1) | (year == year2)
print occurrence[mask].sum()

Note that if you wanted the sum of all occurences between those two years, you'd do something more like: 请注意,如果你想要这两年之间所有出现的总和,你会做更多的事情:

mask = (year >= year1) & (year <= year2)

With pandas , the same approach still works, but as others have noted, there are more efficient ways of building the boolean mask with the isin method, if you're interested in just those two years (and not the interval between them). 使用pandas ,同样的方法仍然有效,但正如其他人已经指出的那样,如果你只对这两年感兴趣(而不是它们之间的间隔),那么使用isin方法构建布尔掩码有更有效的方法。

You need to use & instead of and . 你需要使用&而不是and This means your function should be: 这意味着你的功能应该是:

def dist(s1, s2):
    return df.occur[(df.year >= s1) & (df.year <= s2)].sum()

And then you have: 然后你有:

In [72]: dist(1851, 1852)
Out[72]: 9

Both 1851 <= df.year and df.year <= 1852 create a boolean Series. 1851 <= df.yeardf.year <= 1852创建了一个布尔系列。 The Python and does not work with these objects as we want - it essentially calls bool on each Series and this causes the error. Python and不能像我们想的那样使用这些对象 - 它实质上在每个系列上调用bool ,这会导致错误。 On the other hand, & will perform a element-wise and, returning True when both Series are True . 在另一方面, &将进行逐元素,并返回True当这两个系列是True

You might also find isin() useful for summing the values for a given list of dates. 您可能还会发现isin()对于给定日期列表的值求和非常有用。 For example: 例如:

>>> df.occur[df.year.isin([1851, 1852])].sum()
9
In [21]: import numpy as np

In [22]: import pandas as pd

In [23]: occurrence= np.array([4, 5, 4, 0, 1, 4, 3])

In [24]: year = np.array([1851,1852,1853,1854,1855,1856,1857])

In [25]: my_func = lambda *l: sum([x[0] for x in zip(occurrence, year) if x[1] in l])

In [26]: my_func(1851, 1852)
Out[26]: 9

In [27]: 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM