简体   繁体   English

使用Pandas检查2个系列中的值对的最有效方法是?

[英]Most efficient way with Pandas to check pair of values from 2 series?

Lets say I have a series/dataframe A that looks like 可以说我有一个序列/数据框A ,看起来像

A = [3,2,1,5,4,...

A could also be sorted as it doesn't matter to me. A也可以排序,因为这对我来说无关紧要。 I want to create a new series that keeps track of possible pairs. 我想创建一个新的系列来跟踪可能的配对。 That is, I want the result to look like 也就是说,我希望结果看起来像

B = [3_1, 3_2, 3_4, ..., 2_1, 2_4, ..., 1_4, 1_5,...

That is, I want to exclude 2_3 , since 3_2 already exists. 也就是说,我想排除2_3 ,因为3_2已经存在。 I figure I could create each element in B using something like 我想我可以使用类似的方法在B创建每个元素

for i in A:
    for j in A:
        s = A[i].astype(str) + '_' + A[j].astype(str)
        B.append(pd.Series([s]))

But I'm not sure how to make sure the (i,j) pairing doesn't already exist, such as making sure 2_3 doesn't get added as I mentioned above 但是我不确定如何确保(i,j)配对不存在,例如确保没有如上所述添加2_3

What is the most efficient way to deal with this? 解决这个问题的最有效方法是什么?

from itertools import combinations

s = pd.Series([1, 2, 3, 4])
s2 = pd.Series("_".join([str(a), str(b)]) for a, b in combinations(s, 2))

>>> s2
0    1_2
1    1_3
2    1_4
3    2_3
4    2_4
5    3_4
dtype: object

I don't think this really has much to do with pandas, except for the values originating (and possibly ending) in a series. 我认为这与大熊猫并没有太大关系,除了一系列值(可能是结束值)之外。 Instead, I'd use itertools 相反,我会使用itertools

Say you have an iterable a of values. 假设你有一个可迭代的a值。 Then 然后

import itertools

set((str(i) + '_' + str(j)) for (i, j) in itertools.product(a, a) if i <= j)

will create a set of pairs where the integer before the _ is not larger than that after that, removing duplicates. 将创建一组对,其中_之前的整数不大于其后的整数,并删除重复项。


Example

import itertools

>>> set((str(i) + '_' + str(j)) for (i, j) in itertools.product(a, a) if i < j)
{'1_2',
 '1_3',
 '1_4',
 '1_6',
 '1_7',
 '2_3',
 '2_4',
 '2_6',
 '2_7',
 '3_4',
 '3_6',
 '3_7',
 '4_6',
 '4_7',
 '6_7'}

This can be done via a list comprehension: 这可以通过列表理解来完成:

>>> a = [3, 2, 1, 5, 4]
>>> [(str(x)+'_'+str(y)) for x in a for y in a if y>x]
['3_5', '3_4', '2_3', '2_5', '2_4', '1_3', '1_2', '1_5', '1_4', '4_5']

Note that the ordering of the members in the pairs in the result is sorted because of the y>x statement, which is why we have '1_3' in our output instead of '3_1' . 请注意,由于y>x语句,对结果中成对的成员的排序进行了排序,这就是为什么我们在输出中使用'1_3'而不是'3_1'

While importing itertools and using combinations is a correct way to do this, I usually prefer not to import libraries if I only need one one or two things from them that can also be easily accomplished via direct means. 虽然导入itertools并使用组合是执行此操作的正确方法,但如果我只需要从库中获取一两个内容(也可以通过直接方式轻松完成),则通常不希望导入库。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 pandas DataFrame 中的一对列从辅助表中查找交集的最有效方法是什么? - What's the most efficient way to find intersections from secondary tables based on a pair of columns in a pandas DataFrame? 将pandas整数系列转换为字符串的最有效方法? - Most efficient way to convert pandas series of integers to strings? 将日期字符串转换为pandas时间序列索引的最有效方法 - Most efficient way to convert date strings to a pandas time series index 扩大二进制系列 pandas 的有效区域的最有效方法? - Most efficient way to enlarge the active area of a binary series pandas? 检查 Pandas 数据框中列中的多个条件的最有效方法是什么? - What is the most efficient way to check several conditions in columns in a pandas dataframe? 从两个不相关的系列创建DataFrame的最有效方法是什么? - What is the most efficient way to create a DataFrame from two unrelated series? 从与索引对齐的两个熊猫系列中获取非空字符串值的有效方法 - Efficient way to get the non null string values from two pandas series aligned with index 在 pandas dataframe 中计算不同值的最有效方法是什么? - What is the most efficient way to get count of distinct values in a pandas dataframe? Pandas 将一系列值分配给特定位置的最有效方法 - Pandas the most efficient way of assigning a sequence of values to particular positions 在 Pandas DataFrame 中转换列值的最有效方法 - Most efficient way to convert values of column in Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM