简体   繁体   中英

Pandas dataframe - How to concatenate multiple dataframes having same column

[edit]

I would like to see one column('CUS_ID')'s all 'DAY OF WEEK' first

and then 'CUS_ID's TIME_HOUR next

just concatenating returns all cus_id's day of week first

and then returns all cus_id's TIME_HOUR next....

--

How can I concatenate dataframes using only one column? there are 2 dataframes and I just want to concatenate them into 1 columns which have size of each columns' data as its value

I have a dataframe like this:

print(raw_data.head(10))
   CUS_ID     TIME_ID  TIME_HOUR  DAY
0    1176  2012083016      16h  THU
1    1176  2013030418      18h  MON
2    1192  2012091609      09h  SUN
3    1192  2012101310      10h  SAT
4     120  2012121410      10h  FRI
5     120  2012121915      15h  WED
6     120  2012121915      15h  WED
7    1269  2012070914      14h  MON
8    1269  2012071309      09h  FRI
9    1269  2013031414      14h  THU

I made new dataframe using 'groupby' like this:

df_day = raw_data.groupby(['CUS_ID','DAY']).size()
df_time = raw_data.groupby('CUS_ID').TIME_HOUR.value_counts()
df_concat = pd.concat([df_day, df_time]  ,axis=1, join='outer')
print(df_concat.head(30))


          DAY  TIME_HOUR
2 01h     NaN        6.0
  02h     NaN        3.0
  03h     NaN        2.0
  04h     NaN        1.0
  05h     NaN       15.0
  06h     NaN        4.0
  07h     NaN        7.0
  08h     NaN       22.0
  09h     NaN      214.0
  10h     NaN      515.0
  ...
  FRI   925.0        NaN
  MON   956.0        NaN
  SAT    61.0        NaN
  THU  1267.0        NaN
  TUE  1334.0        NaN
  WED   869.0        NaN
3 07h     NaN       50.0

What I like to get is like this:

2 01h     6.0
  02h     3.0
  03h     2.0
  04h     1.0
  05h     15.0
  ...
  FRI     925.0        
  MON     956.0        
  SAT     61.0        
  THU     1267.0        
  TUE     1334.0        
  WED     869.0   

Please help..

Not sure if good idea join 2 a bit different data information (hours and days), but if really need it:

You can use concat with reindex by values from list:

df_day = raw_data.groupby(['CUS_ID','DAY']).size()
df_time = raw_data.groupby('CUS_ID').TIME_HOUR.value_counts()


L1 = raw_data['TIME_HOUR'].sort_values().unique().tolist()
L2 = ['SUN','MON','TUE',  'WED', 'THU','FRI','SAT']
cats = L2 + L1
print (cats)
['SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', '09h', '10h', '14h', '15h', '16h', '18h']

df_concat = pd.concat([df_day, df_time]).sort_index().reindex(cats, level=1)
print(df_concat.head(30))
CUS_ID  DAY
120     WED    2
        FRI    1
        10h    1
        15h    2
1176    MON    1
        THU    1
        16h    1
        18h    1
1192    SUN    1
        SAT    1
        09h    1
        10h    1
1269    MON    1
        THU    1
        FRI    1
        09h    1
        14h    2
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM