简体   繁体   中英

Python for matrix of a column with matrix value by a corresponding column

I have these data, say d1:

Fruits  Person
Mango   1
Banana  1
Orange  2
Mango   1
Banana  3
Orange  1
Mango   2
Banana  3
Orange  2
Mango   2

I want the output to be something like this:

               Fruit2
Fruit1   Mango     Banana   Orange
Mango   2   0   2
Banana  0       
Orange

A matrix where the value being the number of distinct people who have taken Fruit1 and Fruit2 . Can somebody tell me a way to do this Python. Thanks.

Without knowing what type your data set is, I'm going off the assumption that it's a list of tuples based on the structure you presented.

So if fruit1 is a list of tuples and each tuple contains the name of the fruit and the person's id. Using list comprehension, you can count the number of times the fruit and person appear together, like so:

import itertools

fruit1 = [
    ('Mango', 1),
    ('Banana', 1),
    ('Orange', 2),
    ('Mango', 1),
    ('Banana', 3),
    ('Orange', 1),
    ('Mango', 2),
    ('Banana', 3),
    ('Orange', 2),
    ('Mango', 2),
]

# define sort order (person, fruit)
keyfunc = lambda t: (t[1], t[0])

# sort fruit1
fruit1.sort(key=keyfunc)

# create fruit2
fruit2 = [(len(list(val)), key) for (key, val) in itertools.groupby(fruit1, keyfunc)]

# output
[
    (1, (1, 'Banana')),
    (2, (1, 'Mango')),
    (1, (1, 'Orange')),
    (2, (2, 'Mango')),
    (2, (2, 'Orange')),
    (2, (3, 'Banana')),
]

As you can see fruit2 is a list of tuples, just like fruit1 with the addition of the number of occurrences for the fruit/person. So Person 1 had 1 entry or Banana , 2 for Mango , and so on...

It's not exactly a matrix, however, it's hard to be more specific with the information provided.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM