简体   繁体   中英

Reshaping a single column in to multiple column using Python

I have an excel file containing a single column (Row's number is not fixed). Using Python 3, I want to,

  1. Import my excel file/data in python,
  2. Read/select the data column (first column), and
  3. Reshape this column into multiple columns having 10 rows in each column and finally
  4. Writing output to a new excel file.

I have tried the following code;

import pandas as pd
import numpy as np
df =  pd.read_excel('sample.xlsx')
first_column = pd.DataFrame(df.iloc[:,0])
arr = np.array(first_column)
newArr = arr.reshape(10, -1)

However i am facing the following error:

newArr = arr.reshape(arr, (10, -1)) TypeError: only integer scalar arrays can be converted to a scalar index

Looking for someone to help me out achieving this in Python 3.

My Excel File

1. To read a file in python you need pandas

To read the excel file in python it would be better to first save the file as csv then read it in python. You can save the excel file as csv using Save as option in excel.

 >>> import pandas as pd
 >>> df =  pd.read_csv('fazool.csv')

Then to print the head of the dataframe/table in python

 >>> df.head()
 kMEblue  kMEgreen  kMEturquoise  kMEblack  kMEbrown    kMEred   kMEyellow  data$X  count moduleColors
 0 -0.762233 -0.115623      0.836647 -0.418418 -0.688068 -0.078625      0.316798  VWA5A   1  turquoise
 1 -0.714720 -0.145856      0.802115 -0.420983 first_column_split.csv-0.670826 -0.039813   0.424616  EIF4G2      1    turquoise
 2 -0.785788 -0.259762      0.777330 -0.301520 -0.585565  0.021812   0.412960    CFL1      1    turquoise
 3 -0.736677 -0.296203      0.776179 -0.266430 -0.517727  0.109923   0.526707   NSUN2      1    turquoise
 4 -0.697293  0.030126      0.772833 -0.621229 -0.733419 -0.341270   0.088465   ANXA2      1    turquoise

2. Selecting the first column of the dataframe,

  >>> first_column_df = pd.DataFrame(df.iloc[:,0])
  >>> first_column_df.head()
     kMEblue
  0 -0.762233
  1 -0.714720
  2 -0.785788
  3 -0.736677
  4 -0.697293

  >>> first_column_df.columns # shows the column name 
  Index(['kMEblue'], dtype='object')

3. For reshaping this column into multiple columns each having ten rows you would need numpy,

  >>> import numpy as np
  >>> n = 10 # number to be used as chunk size for the first column
  >>> first_column_df_split = pd.concat([pd.Series(j, name='y' + str(i)) for i,j in enumerate(np.split( first_column_df['kMEblue'].to_numpy(), range(n, len(first_column_df['kMEblue']), n)))], axis=1)

  >>> first_column_df_split.head()
     y0        y1        y2        y3        y4        y5  ...      y478      y479      y480      y481      y482      y483
     0 -0.762233 -0.639253 -0.673571 -0.652639 -0.703227 -0.666183  ...  0.633533  0.628803  0.716792  0.783900  0.725757  0.791240
    1 -0.714720 -0.680753 -0.696416 -0.686810 -0.636661 -0.613642  ...  0.678854  0.807758  0.736286  0.627988  0.853333  0.887149
    2 -0.785788 -0.638530 -0.607706 -0.613452 -0.701420 -0.583315  ...  0.663671  0.649068  0.741015  0.847084  0.718821  0.786994
    3 -0.736677 -0.728837 -0.665220 -0.613386 -0.596789 -0.614878  ...  0.722638  0.587891  0.658215  0.668980  0.794392  0.835687
    4 -0.697293 -0.731756 -0.627547 -0.653920 -0.641218 -0.679153  ...  0.618696  0.740690  0.737382  0.679931  0.706449  0.919852

   [5 rows x 484 columns] 

4. For writing this file to an excel, you can use pandas dataFrame.to_csv()

 >>> first_column_df_split.to_csv("first_column_split.csv")

Adopted from here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM