I have a data frame (data) with columns such as net sales, product, vendor etc. I would have to create sub-data frames from this main_data table per each vendor. Lets say that there are 5 unique vendors (vendor1, vendor2, vendor3, vendor4 and vendor5) in the data table vendor column. I would have to create 5 different sub-data frames for each of these vendors. The sub-data frames should contain all data from the main table, but filtered for vendorX.
How would I do this by using for loops?
If you are using pandas, you can do:
df_v1 = main_data[main_data['vendor'] =='vendor1']
Let's say below is your dataFrame:
As it can be seen in above image, there are 5 vendors(v1,v2,v3,v4,v5)
Code:
import pandas as pd
import numpy as np
#importing dataFrame from dump excel
df = pd.read_excel('stack.xlsx')
dfList = list(set(df['vendor']))
dfNames = ["df" + row for row in dfList]
for i, row in enumerate(dfList):
dfName = dfNames[i]
dfNew = df[df['vendor'] == row]
globals()[dfName] = dfNew
print(globals()[dfName])
print('------------------------------------------')
#from above for loop there will be 5 dataFrames generated as dfv1, dfv3, dfv5, dfv4, dfv2. You can use these all dataFrames now
Consider this:
import pandas as pd
data = {'product': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7'],
'vendor': ['vendor1','vendor2','vendor2','vendor1','vendor3', 'vendor2', 'vendor3'] }
main_data = pd.DataFrame(data)
print('Original dataframe:')
print(main_data)
print('-----')
#this will store key value pairs of vendorX, sub_dataframe for vendorX
by_vendor = dict()
for vendorX in main_data.vendor.unique():
maskX = main_data['vendor'] == vendorX
by_vendor[vendorX] = main_data[maskX]
for vendorX, sub_data in by_vendor.items():
print('subdataframes for vendor ', vendorX)
print(sub_data)
print('-----')
This is the output:
Original dataframe:
product vendor
0 P1 vendor1
1 P2 vendor2
2 P3 vendor2
3 P4 vendor1
4 P5 vendor3
5 P6 vendor2
6 P7 vendor3
-----
subdataframe for vendor vendor1
product vendor
0 P1 vendor1
3 P4 vendor1
-----
subdataframe for vendor vendor2
product vendor
1 P2 vendor2
2 P3 vendor2
5 P6 vendor2
-----
subdataframe for vendor vendor3
product vendor
4 P5 vendor3
6 P7 vendor3
-----
Note that the output has three vendors in this case, but would have more if main_data
had more of them. This code can handle any number of unique vendors.
Here, the answer is stored in a dictionary named by_vendor
, which stores sub_data
dataframe for vendorX
, which can be accessed by by_vendor[vendorX]
( by_vendor['vendor1']
, by_vendor['vendor2']
, etc).
The line for vendorX in main_data.vendor.unique():
iterates over all the unique entries present in the vendor column. For each unique vendor vendorX
, we do the following:
maskX
is a series containg a True
/ False
value for each row, depending on whether the vendor for that row equals vendorX
or not.
We use this maskX with boolean indexing to create sub_data
dataframe for vendorX
.
The left hand side of the expression is simply assigning the sub_data
belonging to vendorX
in a dictionary with key vendorX
.
The two statements can be combined into a single one: by_vendor[vendorX] = main_data[main_data['vendor'] == vendorX]
You can ditch the by_vendor
dictionary and still use boolean indexing to manually put values into five variables named vendorX if you'd like, I found this method to be more elegant as it can be applied to any case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.