Using pd.series to split csv into multiple columns

Question

I am trying to split my txt file I have that is ';' separated into a pandas data frame.

Price Indices - EURO Currency

Date    ;Blue-Chip;Blue-Chip;Broad    ; Broad   ;Ex UK    ;Ex Euro Zone;Blue-Chip; Broad
        ;  Europe ;Euro-Zone;Europe   ;Euro-Zone;         ;            ; Nordic  ; Nordic
        ;  SX5P   ;  SX5E   ;SXXP     ;SXXE     ; SXXF    ;    SXXA    ;    DK5F ; DKXF
31.12.1986;775.00 ;  900.82 ;   82.76 ;   98.58 ;   98.06 ;   69.06 ;  645.26  ;  65.56
01.01.1987;775.00 ;  900.82 ;   82.76 ;   98.58 ;   98.06 ;   69.06 ;  645.26  ;  65.56
02.01.1987;770.89 ;  891.78 ;   82.57 ;   97.80 ;   97.43 ;   69.37 ;  647.62  ;  65.81
05.01.1987;771.89 ;  898.33 ;   82.82 ;   98.60 ;   98.19 ;   69.16 ;  649.94  ;  65.82
06.01.1987;775.92 ;  902.32 ;   83.28 ;   99.19 ;   98.83 ;   69.50 ;  652.49  ;  66.06
07.01.1987;781.21 ;  899.15 ;   83.78 ;   98.96 ;   98.62 ;   70.59 ;  651.97  ;  66.20
08.01.1987;777.62 ;  887.37 ;   83.52 ;   97.87 ;   97.68 ;   71.01 ;  645.57  ;  65.62
09.01.1987;769.80 ;  868.31 ;   83.03 ;   96.31 ;   96.22 ;   71.40 ;  638.03  ;  65.14
12.01.1987;775.07 ;  879.41 ;   83.64 ;   97.54 ;   97.18 ;   71.50 ;  634.14  ;  65.03
13.01.1987;770.00 ;  872.74 ;   83.00 ;   96.78 ;   96.38 ;   70.97 ;  622.44  ;  63.87
14.01.1987;772.04 ;  876.39 ;   82.99 ;   97.14 ;   96.59 ;   70.66 ;  603.63  ;  62.46
15.01.1987;779.12 ;  884.37 ;   83.77 ;   98.10 ;   97.60 ;   71.28 ;  620.01  ;  63.89
16.01.1987;781.66 ;  883.78 ;   84.15 ;   98.11 ;   97.66 ;   71.95 ;  623.77  ;  64.65

The full dataset can be retrieved from the following url

https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt

I read the file into pandas using the following code.

data=pd.read_csv(txt,encoding='utf8')

I get an by 1 data frame and I now need to separate the columns. I was thinking I could drop the first three rows split the column by ";"and then add the headers back on afterwards. I am trying to use the following function.

data1=pd.Series.str.split(data,pat=';',expand=True)

and this returns

TypeError: len() of unsized object

I tried n=9 as there should be 9 columns but this returns the same error message.

data1=pd.Series.str.split(data,pat=';',n=9, expand=True)

Ive also tried this.

data1 = pd.read_csv(txt,index_col=0,parse_dates=True,sep";",dayfirst=True)

but this returns the error

EmptyDataError: No columns to parse from file

Answer 1

Is that what you want?

import pandas as pd
import io
import requests

url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt'

r = requests.get(url)

df = pd.read_csv(io.StringIO(r.text.replace(';\n', '\n')),
                 sep='\s*;\s*',
                 engine='python',
                 skiprows=1,
                 header=[0,1,2],
                 index_col=0,
                 parse_dates=True,
                 dayfirst=True)

Result:

In [266]: df.head()
Out[266]:
Date       Blue-Chip            Broad                        Ex UK       Ex Euro Zone Blue-Chip  Broad
              Europe Euro-Zone Europe Euro-Zone Unnamed: 5_level_1 Unnamed: 6_level_1    Nordic Nordic
                SX5P      SX5E   SXXP      SXXE               SXXF               SXXA      DK5F   DKXF
1986-12-31    775.00    900.82  82.76     98.58              98.06              69.06    645.26  65.56
1987-01-01    775.00    900.82  82.76     98.58              98.06              69.06    645.26  65.56
1987-01-02    770.89    891.78  82.57     97.80              97.43              69.37    647.62  65.81
1987-01-05    771.89    898.33  82.82     98.60              98.19              69.16    649.94  65.82
1987-01-06    775.92    902.32  83.28     99.19              98.83              69.50    652.49  66.06

In [267]: df.shape
Out[267]: (7673, 8)

Using pd.series to split csv into multiple columns

Question

1 answers

solution1
1 ACCPTED 2017-02-09 22:26:00

Using pd.series to split csv into multiple columns

Question

1 answers

solution1 1 ACCPTED 2017-02-09 22:26:00

solution1
1 ACCPTED 2017-02-09 22:26:00