[英]Eliminate Column Repetition in Pandas Dataframe
I have a data frames where I am trying to find all possible combinations of itself and a fraction of itself.我有一个数据框,我试图在其中找到自身和自身一部分的所有可能组合。 The following data frames is a much scaled down version of the one I am running.
以下数据帧是我正在运行的数据帧的缩小版本。 The first data frame (fruit1) is a fraction of the second data frame (fruit2).
第一个数据帧 (fruit1) 是第二个数据帧 (fruit2) 的一部分。
FruitSubDF FruitFullDF
apple apple
cherry cherry
banana banana
peach
plum
By running the following code通过运行以下代码
df1 = pd.DataFrame(list(product(fruitDF.iloc[0:3,0], fruitDF.iloc[0:5,0])), columns=['fruit1', 'fruit2'])
the output is output 是
Fruit1 Fruit2
0 apple banana
1 apple apple
2 apple cherry
3 apple peach
4 apple plum
5 cherry banana
6 cherry apple
7 cherry cherry
.
.
18 banana banana
19 banana peach
20 banana plum
My problem is I want to remove elements with the same two fruits regardless of which fruit is in which column as below.我的问题是我想删除具有相同两个水果的元素,无论哪个水果在下面的哪一列中。 So I am considering (apple,cherry) and (cherry,apple) as the same but I am unsure of an efficient way instead of iterRows to weed out the unwanted data as most pandas functions I find will remove based on the order.
因此,我正在考虑将 (apple,cherry) 和 (cherry,apple) 视为相同,但我不确定是否有一种有效的方法而不是 iterRows 来清除不需要的数据,因为我发现的大多数 pandas 函数将根据订单删除。
Fruit1 Fruit2
0 apple banana
1 apple cherry
2 apple apple
3 apple peach
4 apple plum
5 cherry banana
6 cherry cherry
.
.
15 banana plum
First, I created a piece of code to replicate your DataFrame.首先,我创建了一段代码来复制您的 DataFrame。 I took my code here: stack overflow
我在这里拿了我的代码: 堆栈溢出
import pandas as pd
Fruit1=['apple', 'cherry', 'banana']
Fruit2=['banana', 'apple', 'cherry']
index = pd.MultiIndex.from_product([Fruit1, Fruit2], names = ["Fruit1", "Fruit2"])
df = pd.DataFrame(index = index).reset_index()
Then, you can use the lexicographial order to filter the dataframe.然后,您可以使用字典顺序过滤 dataframe。
df[df['Fruit1']<=df['Fruit2']]
I have the result you wanted to obtain.我有你想要的结果。
EDIT: you edited your post but it seems to still do the job.编辑:您编辑了您的帖子,但它似乎仍然可以完成这项工作。
You can use itertools to achieve it -您可以使用 itertools 来实现它 -
import itertools
fruits = ['banana', 'cherry', 'apple']
pd.DataFrame((itertools.permutations(fruits, 2)), columns=['fruit1', 'fruit2'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.