简体   繁体   English

Python:两个2D阵列的交集

[英]Python: Intersection of Two 2D Arrays

I have data in .csv file called 'Max.csv': 我在.csv文件中有名为'Max.csv'的数据:

Valid Date  MAX
1/1/1995    51
1/2/1995    45
1/3/1995    48
1/4/1995    45

Another csv called 'Min.csv' looks like: 另一个名为'Min.csv'的csv看起来像:

Valid Date  MIN
1/2/1995    33
1/4/1995    31
1/5/1995    30
1/6/1995    39

I want two generate two dictionaries or any other suggested data structure so that I can have two separate variables Max and Min in python respectively as: 我想要两个生成两个字典或任何其他建议的数据结构,以便我可以在python中分别有两个单独的变量Max和Min:

Valid Date  MAX
1/2/1995    45
1/4/1995    45

Valid Date  MIN
1/2/1995    33
1/4/1995    31

ie select the elements from Max and Min so that only the common elements are output. 即从Max和Min中选择元素,以便仅输出公共元素。

I am thinking about using numpy.intersect1d, but that means I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. 我正在考虑使用numpy.intersect1d,但这意味着我必须分别比较日期列中的Max和Min,查找常用日期的索引,然后获取Max和Min的第二列。 This appears too complicated and I feel there are smarter ways to intersect two curves Max and Min. 这看起来太复杂了,我觉得有更聪明的方法来交叉两条曲线Max和Min。

You mention that: 你提到:

I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. 我必须分别比较日期列中的Max和Min,查找常用日期的索引,然后获取Max和Min的第二列。 This appears too complicated... 这似乎太复杂了......

Indeed this is fundamentally what you need to do, one way or the other; 事实上,这基本上是你需要做的事情,无论如何; but using the numpy_indexed package (disclaimer: I am its author), this isn't complicated in the slightest: 但是使用numpy_indexed包(免责声明:我是它的作者),这丝毫不复杂:

import numpy_indexed as npi
common_dates = npi.intersection(min_dates, max_dates)
print(max_values[npi.indices(max_dates, common_dates)])
print(min_values[npi.indices(min_dates, common_dates)])

Note that this solution is fully vectorized (contains no loops on the python-level), and as such is bound to be much faster than the currently accepted answer. 请注意,此解决方案是完全向量化的(在python级别上不包含循环),因此必然会比当前接受的答案快得多。

Note2: this is assuming the date columns are unique; 注2:假设日期列是唯一的; if not, you should replace 'npi.indices' with 'npi.in_' 如果没有,你应该用'npi.in_'替换'npi.indices'

The set() builtin must be enough as follows: set()内置必须足够如下:

>>> max = {"1/1/1995":"51", "1/2/1995":"45", "1/3/1995":"48", "1/4/1995":"45"}
>>> min = {"1/2/1995":"33", "1/4/1995":"31", "1/5/1995":"30", "1/6/1995":"39"}

>>> a = set(max)
>>> b = set(min)
>>> {x:max[x] for x in a.intersection(b)}
{'1/4/1995': '45', '1/2/1995': '45'}
>>> {x:min[x] for x in a.intersection(b)}
{'1/2/1995': '33', '1/4/1995': '31'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM