简体   繁体   中英

pandas data frame groupby column names

I have data frame named: variants_gene_list: enter image description here

I want to create a data frame, which contains all data in the "locus", "AF_afr" and "AF_nfe" columns as different arrays for each unique gene.

I have tried the following code: variants_gene_list = data.groupby('gene').apply(lambda x: [list(x['locus']),list(x['AF_afr']), list(x['AF_nfe'])]).apply(pd.Series)

I got this data frame: (currently, I have only one gene) enter image description here

Question -

  1. How do I access the locus / AF_afr lists in the new dataframe I created?
  2. There are no column names in the data frame I have created, what am I missing? Thanks

Answer to the first question:

variants_gene_list[X].iloc[Y][Z]

Put the name of the column you want instead of X and the name of the row instead of Y and the array number of the list instead of Z.

Answer to the second question:

I think there is no way to do groupby() function without losing column names and the only way is renaming.

variants_gene_list = variants_gene_list.rename(columns={0: 'locus', 1: 'AF_afr', 2:'AF_nfe'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM