I have the following pandas dataframe:
Person Item1 Item2 Item3 Item4
Adam Apple Eggs Cookie
Alex Chocolate Orange Eggs Potato
Gina Eggs Apple Orange Milk
I want to convert it into this:
Item Count Person1 Person2 Person3
Apple 2 Adam Gina
Eggs 3 Adam Alex Gina
Cookie 1 Adam
Chocolate 1 Alex
Orange 2 Alex Gina
Potato 1 Alex
Milk 1 Gina
I have thoroughly searched for my query before posting, but I did not find any matches (maybe there is a better way to rephrase my question). I am sorry if this is a duplicate, but if it is, please direct me to where this question was previously answered.
Use melt
for reshape first:
df = df.melt('Person', value_name='Item')
print (df)
Person variable Item
0 Adam Item1 Apple
1 Alex Item1 Chocolate
2 Gina Item1 Eggs
3 Adam Item2 Eggs
4 Alex Item2 Orange
5 Gina Item2 Apple
6 Adam Item3 Cookie
7 Alex Item3 Eggs
8 Gina Item3 Orange
9 Adam Item4 NaN
10 Alex Item4 Potato
11 Gina Item4 Milk
Then aggregate custom function for list
s with GroupBy.size
and then create new DataFrame
by constructor and join
to count column:
f = lambda x: x.tolist()
f.__name__ = 'Person'
df1 = df.groupby('Item', sort=False)['Person'].agg([f, 'size'])
df2 = pd.DataFrame(df1.pop('Person').values.tolist(), index=df1.index).add_prefix('Person')
df3 = df1.join(df2).reset_index()
print (df3)
Item size Person0 Person1 Person2
0 Apple 2 Adam Gina None
1 Chocolate 1 Alex None None
2 Eggs 3 Gina Adam Alex
3 Orange 2 Alex Gina None
4 Cookie 1 Adam None None
5 Potato 1 Alex None None
6 Milk 1 Gina None None
This isn't quite what you're looking for, but I'm not sure that "transposition" exists as a simple function. (By the way, transpose
, following linear algebra, usually means rotating a dataframe 90°).
# get items
items = []
for c in df.columns[1:]:
items.extend(df[c].values)
items = list(set(items))
items.remove(None)
people = df.Person.values
counts = {}
for p in people:
counts[p] = [1 if item in df[df['Person'] == p].values else 0 for item in items]
new = pd.DataFrame(counts, index=items)
new['Count'] = new.sum(axis=1)
Output:
| | Adam | Alex | Gina | Count |
|-----------|------|------|------|-------|
| Cookie | 1 | 0 | 0 | 1 |
| Chocolate | 0 | 1 | 0 | 1 |
| Potato | 0 | 1 | 0 | 1 |
| Eggs | 1 | 1 | 1 | 3 |
| Milk | 0 | 0 | 1 | 1 |
| Orange | 0 | 1 | 1 | 2 |
| Apple | 1 | 0 | 1 | 2 |
EDIT: as usual, jezrael has the correct answer, but I tweaked this to get the output you want. It might be a bit easier to understand for a beginner.
Given 'df' as your example:
item_counts = {}
for item in items:
counts = {}
count = 0
for p in people:
if item in df[df['Person'] == p].values:
count += 1
counts['Person' + str(count)] = p
counts['count'] = count
item_counts[item] = counts
new = pd.DataFrame.from_dict(item_counts, orient='index')
new = new[['count', 'Person1', 'Person2', 'Person3']] # rearrange columns, optional
Output:
| | count | Person1 | Person2 | Person3 |
|-----------|-------|---------|---------|---------|
| Apple | 2 | Adam | Gina | NaN |
| Chocolate | 1 | Alex | NaN | NaN |
| Cookie | 1 | Adam | NaN | NaN |
| Eggs | 3 | Adam | Alex | Gina |
| Milk | 1 | Gina | NaN | NaN |
| Orange | 2 | Alex | Gina | NaN |
| Potato | 1 | Alex | NaN | NaN |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.