简体   繁体   中英

adding a list to a dataframe whilst concating by length fo DF

Unable to find a solution for this so apologies if this seems simple.

I have a df as follows :

ID, Week
5, 1
6, 1
7, 1

I have a list of reason codes as follows

['Work', 'Holiday', 'Sick', 'Jury'] 

What I'm trying to do is add this to my current dataframe but multiply this by each unique ID

so i would have something like (i'll only use 1 unique ID for brevity).,

ID, Week, Reason
5,  1,    'Work'
5,  1,    'Holiday'
5,  1,    'Sick',
5,  1,    'Jury'

I have tried all manner of concats, with varying axis's but I'm none-the-wiser how to approach this.

any help would be greatly appreciated.

You can use itertools for this:

import itertools
df_new=(pd.DataFrame(list(itertools.product(df.ID,df.Week,reason_codes)),
    columns=df.columns.tolist()+['Reason']).drop_duplicates().reset_index(drop=True))
print(df_new)

    ID  Week   Reason
0    5     1     Work
1    5     1  Holiday
2    5     1     Sick
3    5     1     Jury
4    6     1     Work
5    6     1  Holiday
6    6     1     Sick
7    6     1     Jury
8    7     1     Work
9    7     1  Holiday
10   7     1     Sick
11   7     1     Jury

Let us do unnesting

df['Reason']=[l]*len(df)
unnesting(df,['Reason'])
Out[1014]: 
    Reason  ID  Week
0     Work   5     1
0  Holiday   5     1
0     Sick   5     1
0     Jury   5     1
1     Work   6     1
1  Holiday   6     1
1     Sick   6     1
1     Jury   6     1
2     Work   7     1
2  Holiday   7     1
2     Sick   7     1
2     Jury   7     1

# attached self-define function
def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

Or we do cross join

df.merge(pd.DataFrame({'Reason':l}).assign(Week=1))
Out[1020]: 
    ID  Week   Reason
0    5     1     Work
1    5     1  Holiday
2    5     1     Sick
3    5     1     Jury
4    6     1     Work
5    6     1  Holiday
6    6     1     Sick
7    6     1     Jury
8    7     1     Work
9    7     1  Holiday
10   7     1     Sick
11   7     1     Jury

let codes be the list of reasons

codes = ['Work', 'Holiday', 'Sick', 'Jury']

Assign the same codes to all the rows of the data frame. The syntax on the RHS just dupliates the list N times, where N is the length of the data frame

df['codes'] = [codes] * len(df)

Then stack the codes column, ie change from horizontal to vertical format. In this step it is also necessary to set the ID & Week columns as indexes & then reset them after the stack operation is done.

df.set_index(['ID','Week']).codes.apply(pd.Series).stack().reset_index(['ID', 'Week'])

Outputs:

   ID  Week        0
0   5     1     Work
1   5     1  Holiday
2   5     1     Sick
3   5     1     Jury
0   6     1     Work
1   6     1  Holiday
2   6     1     Sick
3   6     1     Jury
0   7     1     Work
1   7     1  Holiday
2   7     1     Sick
3   7     1     Jury

The only thing left is to assign an appropriate name to the newly created column, and if your code depends on the index value, fix it appropriately. In this version, the index values are carried over from the original dataframe & repeated four times

I think you just need to create a DataFrame from

df_codes = pd.DataFrame({'codes': ['Work', 'Holiday', 'Sick', 'Jury']})    

After that you just need to do cross join DataFrames without common columns as in this thread join-two-dataframes-with-no-common-columns-for-calculations

base on that thread, after create df_codes , you just need this command:

df_new = df.assign(a=1).merge(df_codes.assign(a=1)).drop('a', 1)

Output:

Out[148]:
    ID  Week    codes
0    5     1     Work
1    5     1  Holiday
2    5     1     Sick
3    5     1     Jury
4    6     1     Work
5    6     1  Holiday
6    6     1     Sick
7    6     1     Jury
8    7     1     Work
9    7     1  Holiday
10   7     1     Sick
11   7     1     Jury

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM