I am using pandas and I trying to figure out a way that I can get the most common combinations of products people use in my datafile.
Supposing that each column of the next three AA, BB and CC represents a completely different product and the 0 value means that I don't use this product and the 1 that I do use it. Also, each row represents and a completely different buyer.
For example, the most common combination in my example is the products AA and CC because I have three people that use them as you can see in lines 1,4,5.
My result I would like to be something like 'The most common combination is the products AA and CC which are used by 3 people'.
I hope I have explained to you better this time
Below is an example of my DataFrame:
AA | BB | CC
_______________
1 | 0 | 1
0 | 0 | 1
0 | 1 | 0
1 | 0 | 1
1 | 0 | 1
Once you count duplicate rows , you just need to do a bit of work to get the corresponding labels.
Here's how I would do it, though I'm not very familiar with Pandas so there's probably a better way. Firstly, the df should be boolean.
import pandas as pd
df = pd.DataFrame({
'AA': [1, 0, 0, 1, 1],
'BB': [0, 0, 1, 0, 0],
'CC': [1, 1, 0, 1, 1]}
).astype(bool)
# Count duplicate rows
counts = df.groupby(df.columns.tolist()).size()
# Get most common rows
maxima = counts[counts==counts.max()]
for combination, count in maxima.iteritems():
# Select matching labels
labels = df.columns[list(combination)]
print(*labels, count)
Output:
AA CC 3
Partial results:
>>> counts
AA BB CC
False False True 1
True False 1
True False True 3
dtype: int64
>>> maxima
AA BB CC
True False True 3
dtype: int64
I was able to figure out almost the solution to my question before your response, but you wjandrea were partially correct, so thank you.
First, I had to go through the whole dataframe, row by row, looking for the one value each time like this and get the product name that I have 1.
combination = df.apply(lambda row: row[row == 1].index.tolist(), axis=1)
combination = pd.DataFrame(combination)
After that, I created a new column with the names of the products each user use which I had to separate likes.
df['Products'] = [' , '.join(map(str, l)) for l in combination[0]]
Then I just used your code and I get exactly what I wanted
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.