简体   繁体   中英

How to 'split' a pandas df column based on conditional and pivot the df

I have a df of 4400 rows. Created this df on reading xlsx file.

To make my question clear I created an example df.

This gives the following result (a simplefied version of my true problem):

shop          amount
0   shop A      15
1   product 1   4
2   product 2   5
3   product 3   6
4   BBBB        19
5   product 1   7
6   product 2   9
7   product 3   3
8   CCCC        21
9   product 1   6
10  product 2   7
11  product 3   8
12  DDDD        18
13  product 1   4
14  product 2   3
15  product 3   11

As you can see, behind every shopname is the total number of three products sold in that shop. Every shop has the same products. But every shop has a total different name.

Having 4400 rows and many shops with all different names (but exactly the same products) I would like to pivot my df: shopnames as first column and all the products as columns names. And logically, the amount of products per shop correctly in the right column.

There is (for me that is) no way of distinghuishing between shop name en product name. However: the list of products beneath each shopname is exactly the same and in same order.

I my self have no idea how to 'filter'all the shop names from the productnames. Hopefully one of you has an idea for me. Many thanks again! greetings Jan

ps: I used this code for making example df:

d = {'shop': ['shop A', 'product 1', 'product2','product 3','BBBB', 'product 1', 'product 2','product 3','CCCC', 'product 1', 'product 2', 'product 3','DDDD', 'product 1', 'product 2', 'product 3'], 'amount': [15,4,5,6,19,7,9,3,21,6,7,8, 18,4,3,1]}

df = pd.DataFrame(data=d)

df

You have a typo in your data set, product2 should be product 2 . After fixing that you can do the following:

import pandas as pd
import numpy as np

d = {'shop': ['shop A', 'product 1', 'product 2','product 3','shop B', 'product 1', 'product 2','product 3','shop C', 'product 1', 'product 2', 'product 3','shop D', 'product 1', 'product 2', 'product 3'], 'amount': [15,4,5,6,19,7,9,3,21,6,7,8, 18,4,3,1]}


df = pd.DataFrame(data=d)

# Create grouping column
df['g']  = np.where(df['shop'].str.contains('shop'), df['shop'], np.nan)
df = df.ffill()

# Get rows that have totals by shop
total_rows = df.groupby('g')['amount'].idxmax().values

# Drop total rows
df = df.loc[~df.index.isin(total_rows)]

# Rename columns
df.columns = ['product','amount','shop']

# Pivot
df.pivot_table(index='shop',columns='product',values='amount')

Output

product product 1   product 2   product 3
shop            
shop A          4           5           6
shop B          7           9           3
shop C          6           7           8
shop D          4           3           1

Assuming your shop names are unique, and the products are repeated:

d = {'shop': ['shop A', 'product 1', 'product 2','product 3','BBBB', 'product 1', 'product 2','product 3','CCCC', 'product 1', 'product 2', 'product 3','DDDD', 'product 1', 'product 2', 'product 3'], 'amount': [15,4,5,6,19,7,9,3,21,6,7,8, 18,4,3,1]}

df = pd.DataFrame(data=d)

g = df.groupby('shop').size().reset_index()
df['g'] = np.where(df['shop'].isin(g[g[0]==1]['shop'].values), df['shop'], np.nan)
# # Create grouping column
# df['g']  = np.where(df['shop'].str.contains('shop'), df['shop'], np.nan)
df = df.ffill()

# Get rows that have totals by shop
total_rows = df.groupby('g')['amount'].idxmax().values

# Drop total rows
df = df.loc[~df.index.isin(total_rows)]

# Rename columns
df.columns = ['product','amount','shop']

# Pivot
df.pivot_table(index='shop',columns='product',values='amount')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM