I am in the process of refining my code for a project where I am creating shipping lanes. What I currently have is dataframe that is put together by the index values of c_match. Cool, great everything looks correct at first glance.
A shipping lane is a group of states with the same discount and min charge. My code returns states with the same discount. Most states that have the same discount also have the same min charge. However the outliers are states with the same discount and different min charges.
The goal: To create shipping lanes that have the same min charges and the same discount percentages.
My Idea: Create a logical operation that concats the state names who have identical rates and costs and also returns their rates and costs. States with different costs for the same rate still need to be accounted for.
Desired Output:
Shipping Lane Rate Cost
20_21_RDWY_Purple_AL_AR_KY_LA_MS_SC_TN_PE 50.80% 120
20_21_RDWY_Purple_AZ 50.80% 155
20_21_RDWY_Purple_CA 62.40% 145
20_21_RDWY_Purple_CO_ND_WY_MB_NF_PQ 62.40% 155
20_21_RDWY_Purple_CT_DE_MN_NE 50.00% 145
20_21_RDWY_Purple_DC_IA_KS_MD_MI_OH_OK_WI 49.00% 125
20_21_RDWY_Purple_FL 48.30% 125
Current Code:
def remove_dups(input, output):
input.sort()
n_list = list(input for input, _ in itertools.groupby(input))
output.append(n_list)
def get_matches_discount(state):
state_groups = []
state_rates = []
state_cost = []
final_format = []
match = []
c_match = []
for i, x in enumerate(df_d[state]):
#checks within the column for identical values then maps where the identical values are
match1 = [j for j, y in enumerate(df_d[state].isin([x])) if y is True]
match.append(match1)
remove_dups(match, c_match)
for list in c_match:
for elements in list:
r = elements[0]
state_g = df_d.index[elements]
state_groups.append(state_g)
state_r = df_d[state][r]
state_rates.append(state_r)
print(state_rates)
match_cost = df_m[state][r]
state_cost.append(match_cost)
for i in state_groups:
delimiter = "_"
join_str = delimiter.join(i)
j_str = "20_21_RDWY_Purple_" + join_str
final_format.append(j_str)
master_frame = pd.DataFrame(
{'Shipping Lane': final_format,
'Rate': state_rates,
'Cost': state_cost,
}
)
print(master_frame)
return master_frame
m_col_names = ['AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA',
'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH',
'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY', 'AB', 'BC',
'MB', 'NB', 'NF', 'NS', 'ON', 'PE', 'PQ', 'SK']
# calls the function in a loop to process one column at a time
# creates the master data frame outside of the function calling for loop
master_dataframe0 = pd.DataFrame()
for state in m_col_names:
temp_df = get_matches_discount(state)
# Stores the function call as a variable
master_dataframe0 = master_dataframe0.append(temp_df)
# Creates an appended dataframe outside of the function
print(master_dataframe0)
master_dataframe0.to_excel("shipping_lanes_revised00.xlsx")
Sample input:
Minimum Charge Table
this is dataframe: df_m
State AL AR AZ CA CO CT DC
AL 120.00 120.00 155.00 145.00 155.00 145.00 125.00
AR 120.00 120.00 155.00 155.00 145.00 155.00 145.00
AZ 155.00 155.00 120.00 120.00 125.00 185.00 185.00
CA 145.00 164.30 120.00 120.00 170.00 185.00 185.00
CO 155.00 145.00 125.00 145.00 120.00 155.00 155.00
CT 145.00 155.00 185.00 185.00 155.00 120.00 120.00
DC 125.00 155.00 185.00 185.00 155.00 120.00 185.00
DE 145.00 155.00 185.00 185.00 155.00 120.00 120.00
FL 125.00 145.00 145.00 185.00 145.00 155.00 145.00
GA 120.00 120.00 155.00 145.00 155.00 145.00 120.00
IA 125.00 125.00 155.00 145.00 125.00 155.00 145.00
ID 145.00 155.00 145.00 145.00 125.00 185.00 185.00
IL 120.00 120.00 155.00 145.00 145.00 125.00 125.00
IN 120.00 120.00 155.00 145.00 145.00 125.00 120.00
KS 125.00 120.00 155.00 155.00 120.00 155.00 145.00
KY 120.00 120.00 155.00 145.00 145.00 125.00 125.00
LA 120.00 120.00 155.00 145.00 155.00 155.00 155.00
MA 155.00 155.00 185.00 185.00 145.00 120.00 120.00
MD 125.00 145.00 185.00 185.00 155.00 120.00 120.00
ME 155.00 155.00 185.00 185.00 145.00 120.00 125.00
MI 125.00 125.00 145.00 145.00 155.00 125.00 120.00
MN 145.00 125.00 155.00 145.00 145.00 155.00 145.00
MO 120.00 120.00 155.00 155.00 125.00 145.00 145.00
MS 120.00 120.00 155.00 155.00 145.00 155.00 145.00
MT 145.00 155.00 155.00 155.00 125.00 185.00 185.00
NC 120.00 125.00 145.00 185.00 155.00 125.00 120.00
ND 155.00 155.00 145.00 145.00 155.00 155.00 155.00
NE 145.00 125.00 155.00 155.00 120.00 155.00 155.00
NH 155.00 155.00 185.00 185.00 145.00 120.00 120.00
NJ 145.00 155.00 185.00 185.00 155.00 120.00 120.00
NM 155.00 125.00 120.00 145.00 120.00 145.00 145.00
NV 145.00 155.00 120.00 120.00 145.00 185.00 185.00
NY 145.00 145.00 185.00 185.00 155.00 120.00 120.00
OH 125.00 125.00 145.00 145.00 155.00 120.00 120.00
OK 125.00 120.00 145.00 155.00 120.00 155.00 155.00
OR 185.00 145.00 155.00 125.00 155.00 185.00 185.00
PA 145.00 145.00 185.00 185.00 155.00 120.00 120.00
RI 155.00 155.00 185.00 185.00 145.00 120.00 120.00
SC 120.00 120.00 145.00 185.00 155.00 125.00 120.00
SD 155.00 145.00 155.00 155.00 120.00 155.00 145.00
TN 120.00 120.00 155.00 145.00 155.00 145.00 125.00
TX 125.00 120.00 145.00 155.00 125.00 145.00 155.00
UT 170.00 164.30 132.50 132.50 127.20 145.00 145.00
VA 120.00 145.00 145.00 185.00 155.00 120.00 120.00
Discount Table
this is datatframe: df_d
State AL AR AZ CA CO CT DC
AL 50.80% 44.10% 54.30% 73.10% 53.90% 50.00% 49.00%
AR 50.80% 50.80% 53.90% 65.70% 50.00% 53.90% 50.00%
AZ 56.70% 55.80% 50.80% 54.10% 49.60% 59.50% 64.40%
CA 62.40% 61.00% 54.30% 61.40% 43.00% 52.30% 54.30%
CO 54.30% 67.10% 49.00% 65.70% 50.80% 54.30% 54.30%
CT 50.00% 53.90% 64.40% 72.50% 54.30% 50.80% 50.80%
DC 49.00% 53.90% 64.40% 64.40% 54.30% 50.80% 64.40%
DE 50.00% 53.90% 64.40% 64.40% 54.30% 50.80% 50.80%
FL 48.30% 35.00% 55.50% 55.50% 55.10% 66.40% 62.30%
GA 67.90% 44.10% 71.00% 64.60% 56.00% 50.00% 44.10%
IA 49.00% 49.00% 54.30% 61.80% 49.00% 53.90% 50.00%
ID 61.80% 54.30% 50.00% 75.90% 49.00% 64.40% 64.40%
IL 44.10% 44.10% 54.30% 64.00% 50.00% 49.00% 49.00%
IN 44.10% 1.60% 11.70% 26.10% -0.70% 49.00% 44.10%
KS 49.00% 63.40% 61.00% 67.70% 72.50% 72.20% 50.00%
KY 50.80% 44.10% 54.30% 61.50% 50.00% 49.00% 49.00%
LA 50.80% 44.10% 54.30% 61.80% 53.90% 54.30% 53.90%
MA 63.50% 53.90% 67.70% 63.90% 53.00% 63.50% 44.10%
MD 49.00% 50.00% 64.40% 73.80% 54.30% 50.80% 50.80%
ME 53.90% 54.30% 64.40% 64.40% 61.80% 50.80% 49.00%
MI 49.00% 49.00% 61.80% 55.10% 53.90% 49.00% 44.10%
MN 50.00% 49.00% 54.30% 61.80% 50.00% 53.90% 50.00%
MO 44.10% 50.80% 53.90% 56.10% 49.00% 50.00% 50.00%
MS 50.80% 50.80% 54.30% 63.90% 50.00% 53.90% 50.00%
MT 61.80% 54.30% 53.90% 75.80% 49.00% 64.40% 64.40%
NC 44.10% 59.20% 53.50% 58.60% 57.90% 42.90% 69.60%
ND 54.30% 53.90% 61.80% 61.80% 54.30% 53.90% 53.90%
NE 50.00% 49.00% 54.30% 54.30% 44.10% 53.90% 53.90%
NH 53.90% 54.30% 64.40% 64.40% 61.80% 50.80% 44.10%
NJ 50.50% 51.50% 70.50% 66.20% 59.70% 67.10% 50.80%
NM 53.90% 49.00% 44.10% 68.20% 44.10% 61.80% 61.80%
NV 61.80% 54.30% 52.70% 73.50% 50.00% 64.40% 64.40%
NY 61.10% 69.00% 65.50% 68.90% 63.00% 68.40% 50.80%
OH 49.00% 49.00% 68.50% 71.50% 72.30% 60.70% 44.10%
OK 49.00% 50.80% 50.00% 54.30% 44.10% 54.30% 54.30%
OR 64.40% 61.80% 53.90% 64.00% 53.90% 64.40% 64.40%
PA 47.20% 57.00% 33.70% 51.90% 45.50% 50.80% 50.80%
RI 53.90% 54.30% 64.40% 64.40% 61.80% 50.80% 44.10%
SC 50.80% 44.10% 61.80% 58.70% 54.30% 49.00% 44.10%
SD 53.90% 50.00% 54.30% 54.30% 44.10% 54.30% 61.80%
TN 50.80% 50.80% 52.50% 62.60% 61.30% 53.30% 49.00%
TX 56.60% 46.00% 51.40% 58.30% 53.20% 63.10% 65.10%
UT 45.00% 60.60% 73.50% 73.50% 70.30% 44.40% 61.90%
VA 57.90% 50.00% 61.80% 72.10% 54.30% 44.10% 50.80%
Current Output:
Shipping Lane Rate Cost
0 20_21_RDWY_Purple_AL_AR_KY_LA_MS_SC_TN_PE 50.80% 120.0
1 20_21_RDWY_Purple_AZ 56.70% 155.0
2 20_21_RDWY_Purple_CA 62.40% 145.0
3 20_21_RDWY_Purple_CO_ND_WY_MB_NF_PQ 54.30% 155.0
4 20_21_RDWY_Purple_CT_DE_MN_NE 50.00% 145.0
5 20_21_RDWY_Purple_DC_IA_KS_MD_MI_OH_OK_WI 49.00% 125.0
6 20_21_RDWY_Purple_FL 48.30% 125.0
7 20_21_RDWY_Purple_GA 67.90% 120.0
8 20_21_RDWY_Purple_ID_MT_NV_AB_SK 61.80% 145.0
9 20_21_RDWY_Purple_IL_IN_MO_NC_WV 44.10% 120.0
10 20_21_RDWY_Purple_MA 63.50% 155.0
11 20_21_RDWY_Purple_ME_NH_NM_RI_SD_VT_NB_NS 53.90% 155.0
12 20_21_RDWY_Purple_NJ 50.50% 145.0
13 20_21_RDWY_Purple_NY 61.10% 145.0
14 20_21_RDWY_Purple_OR_WA_BC 64.40% 185.0
15 20_21_RDWY_Purple_PA 47.20% 145.0
16 20_21_RDWY_Purple_TX 56.60% 125.0
17 20_21_RDWY_Purple_UT 45.00% 170.0
18 20_21_RDWY_Purple_VA 57.90% 120.0
19 20_21_RDWY_Purple_ON 37.30% 145.0
0 20_21_RDWY_Purple_AL_GA_IL_KY_LA_SC 44.10% 120.0
1 20_21_RDWY_Purple_AR_MO_MS_OK_TN_NB_NF_NS_PE 50.80% 120.0
2 20_21_RDWY_Purple_AZ 55.80% 155.0
3 20_21_RDWY_Purple_CA 61.00% 164.3
4 20_21_RDWY_Purple_CO 67.10% 145.0
5 20_21_RDWY_Purple_CT_DC_DE_MA_ND_MB 53.90% 155.0
6 20_21_RDWY_Purple_FL 35.00% 145.0
7 20_21_RDWY_Purple_IA_MI_MN_NE_NM_OH_WI_WV 49.00% 125.0
8 20_21_RDWY_Purple_ID_ME_MT_NH_NV_RI_VT_PQ_SK 54.30% 155.0
9 20_21_RDWY_Purple_IN 1.60% 120.0
10 20_21_RDWY_Purple_KS 63.40% 120.0
11 20_21_RDWY_Purple_MD_SD_VA_WY 50.00% 145.0
12 20_21_RDWY_Purple_NC 59.20% 125.0
13 20_21_RDWY_Purple_NJ 51.50% 155.0
14 20_21_RDWY_Purple_NY 69.00% 145.0
15 20_21_RDWY_Purple_OR_WA_AB 61.80% 145.0
16 20_21_RDWY_Purple_PA 57.00% 145.0
17 20_21_RDWY_Purple_TX 46.00% 120.0
18 20_21_RDWY_Purple_UT 60.60% 164.3
19 20_21_RDWY_Purple_BC 64.40% 185.0
20 20_21_RDWY_Purple_ON 32.10% 145.0
0 20_21_RDWY_Purple_AL_CA_IA_IL_KY_LA_MN_MS_NE_SD_WA_AB_BC 54.30% 155.0
1 20_21_RDWY_Purple_AR_MO_MT_OR 53.90% 155.0
2 20_21_RDWY_Purple_AZ_NB_NF_NS_PE 50.80% 120.0
3 20_21_RDWY_Purple_CO 49.00% 125.0
4 20_21_RDWY_Purple_CT_DC_DE_MD_ME_NH_RI_VT_ON_PQ_SK 64.40% 185.0
5 20_21_RDWY_Purple_FL 55.50% 145.0
6 20_21_RDWY_Purple_GA 71.00% 155.0
7 20_21_RDWY_Purple_ID_OK_WY 50.00% 145.0
8 20_21_RDWY_Purple_IN 11.70% 155.0
9 20_21_RDWY_Purple_KS 61.00% 155.0
10 20_21_RDWY_Purple_MA 67.70% 185.0
11 20_21_RDWY_Purple_MI_ND_SC_VA_WV_MB 61.80% 145.0
12 20_21_RDWY_Purple_NC 53.50% 145.0
13 20_21_RDWY_Purple_NJ 70.50% 185.0
14 20_21_RDWY_Purple_NM 44.10% 120.0
15 20_21_RDWY_Purple_NV 52.70% 120.0
16 20_21_RDWY_Purple_NY 65.50% 185.0
17 20_21_RDWY_Purple_OH 68.50% 145.0
18 20_21_RDWY_Purple_PA 33.70% 185.0
19 20_21_RDWY_Purple_TN 52.50% 155.0
20 20_21_RDWY_Purple_TX 51.40% 145.0
You have multiple rows for states but they are also on columns. It looks like you were just showing example output for the AL
column though? You can merge two dataframes on State
and then .groupby
Rate
and Cost
. Then, return a joined string (with .apply(lambda x: '_'.join(x))
) of the states with the same rate and cost (since you grouped by them they will have same rate and cost):
master_dataframe0 = (pd.merge(df_d[['State', 'AL']], df_m[['State', 'AL']], how='inner', on='State')
.rename({'AL_x' : 'Rate', 'AL_y' : 'Cost'}, axis=1)
.groupby(['Rate', 'Cost'])['State'].apply(lambda x: '_'.join(x)).reset_index()
.sort_values('State'))
master_dataframe0 = master_dataframe0[['State', 'Rate', 'Cost']].assign(State='20_21_RDWY_Purple_' + master_dataframe0['State'])
master_dataframe0
Out[1]:
State Rate Cost
7 20_21_RDWY_Purple_AL_AR_KY_LA_MS_SC_TN 50.80% 120.0
11 20_21_RDWY_Purple_AZ 56.70% 155.0
15 20_21_RDWY_Purple_CA 62.40% 145.0
9 20_21_RDWY_Purple_CO_ND 54.30% 155.0
5 20_21_RDWY_Purple_CT_DE_MN_NE 50.00% 145.0
4 20_21_RDWY_Purple_DC_IA_KS_MD_MI_OH_OK 49.00% 125.0
3 20_21_RDWY_Purple_FL 48.30% 125.0
18 20_21_RDWY_Purple_GA 67.90% 120.0
14 20_21_RDWY_Purple_ID_MT_NV 61.80% 145.0
0 20_21_RDWY_Purple_IL_IN_MO_NC 44.10% 120.0
16 20_21_RDWY_Purple_MA 63.50% 155.0
8 20_21_RDWY_Purple_ME_NH_NM_RI_SD 53.90% 155.0
6 20_21_RDWY_Purple_NJ 50.50% 145.0
13 20_21_RDWY_Purple_NY 61.10% 145.0
17 20_21_RDWY_Purple_OR 64.40% 185.0
2 20_21_RDWY_Purple_PA 47.20% 145.0
10 20_21_RDWY_Purple_TX 56.60% 125.0
1 20_21_RDWY_Purple_UT 45.00% 170.0
12 20_21_RDWY_Purple_VA 57.90% 120.0
Using Erickson's help with the .groupby
and lambda functions, we arrive at the correct solution:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
df_d = pd.read_excel(path,
sheet_name=0,
header=0,
index_col=False,
keep_default_na=True)
df_m = pd.read_excel(path2,
sheet_name=0,
header=0,
index_col=False,
keep_default_na=True)
m_col_names = ['AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA',
'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH',
'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY', 'AB', 'BC',
'MB', 'NB', 'NF', 'NS', 'ON', 'PE', 'PQ', 'SK']
final_frame = pd.DataFrame()
for state in m_col_names:
master_dataframe0 = (pd.merge(df_d[['State', state]], df_m[['State', state]], how='inner', on='State')
.rename({state + '_x': 'Rate', state + '_y': 'Cost'}, axis=1)
.groupby(['Rate', 'Cost'])['State'].apply(lambda x: '_'.join(x)).reset_index()
.sort_values('State'))
master_dataframe0['Origin'] = state
master_dataframe0 = master_dataframe0[['State', 'Rate', 'Cost', 'Origin']].assign(
State='20_21_RDWY_Purple_' + master_dataframe0['State'])
final_frame = final_frame.append(master_dataframe0)
print(final_frame)
final_frame.to_excel("w3llshipmeright.xlsx")
Correct Output:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.