[英]Python: Intersection of Two 2D Arrays
I have data in .csv
file called 'Max.csv': 我在
.csv
文件中有名为'Max.csv'的数据:
Valid Date MAX
1/1/1995 51
1/2/1995 45
1/3/1995 48
1/4/1995 45
Another csv called 'Min.csv' looks like: 另一个名为'Min.csv'的csv看起来像:
Valid Date MIN
1/2/1995 33
1/4/1995 31
1/5/1995 30
1/6/1995 39
I want two generate two dictionaries or any other suggested data structure so that I can have two separate variables Max and Min in python respectively as: 我想要两个生成两个字典或任何其他建议的数据结构,以便我可以在python中分别有两个单独的变量Max和Min:
Valid Date MAX
1/2/1995 45
1/4/1995 45
Valid Date MIN
1/2/1995 33
1/4/1995 31
ie select the elements from Max and Min so that only the common elements are output. 即从Max和Min中选择元素,以便仅输出公共元素。
I am thinking about using numpy.intersect1d, but that means I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min. 我正在考虑使用numpy.intersect1d,但这意味着我必须分别比较日期列中的Max和Min,查找常用日期的索引,然后获取Max和Min的第二列。 This appears too complicated and I feel there are smarter ways to intersect two curves Max and Min.
这看起来太复杂了,我觉得有更聪明的方法来交叉两条曲线Max和Min。
You mention that: 你提到:
I have to separately compare the Max and Min first on date column, find the index of common dates and then grab the second columns for Max and Min.
我必须分别比较日期列中的Max和Min,查找常用日期的索引,然后获取Max和Min的第二列。 This appears too complicated...
这似乎太复杂了......
Indeed this is fundamentally what you need to do, one way or the other; 事实上,这基本上是你需要做的事情,无论如何; but using the numpy_indexed package (disclaimer: I am its author), this isn't complicated in the slightest:
但是使用numpy_indexed包(免责声明:我是它的作者),这丝毫不复杂:
import numpy_indexed as npi
common_dates = npi.intersection(min_dates, max_dates)
print(max_values[npi.indices(max_dates, common_dates)])
print(min_values[npi.indices(min_dates, common_dates)])
Note that this solution is fully vectorized (contains no loops on the python-level), and as such is bound to be much faster than the currently accepted answer. 请注意,此解决方案是完全向量化的(在python级别上不包含循环),因此必然会比当前接受的答案快得多。
Note2: this is assuming the date columns are unique; 注2:假设日期列是唯一的; if not, you should replace 'npi.indices' with 'npi.in_'
如果没有,你应该用'npi.in_'替换'npi.indices'
The set()
builtin must be enough as follows: set()
内置必须足够如下:
>>> max = {"1/1/1995":"51", "1/2/1995":"45", "1/3/1995":"48", "1/4/1995":"45"}
>>> min = {"1/2/1995":"33", "1/4/1995":"31", "1/5/1995":"30", "1/6/1995":"39"}
>>> a = set(max)
>>> b = set(min)
>>> {x:max[x] for x in a.intersection(b)}
{'1/4/1995': '45', '1/2/1995': '45'}
>>> {x:min[x] for x in a.intersection(b)}
{'1/2/1995': '33', '1/4/1995': '31'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.