I am trying to split my txt file I have that is ';' separated into a pandas data frame.
Price Indices - EURO Currency
Date ;Blue-Chip;Blue-Chip;Broad ; Broad ;Ex UK ;Ex Euro Zone;Blue-Chip; Broad
; Europe ;Euro-Zone;Europe ;Euro-Zone; ; ; Nordic ; Nordic
; SX5P ; SX5E ;SXXP ;SXXE ; SXXF ; SXXA ; DK5F ; DKXF
31.12.1986;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56
01.01.1987;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56
02.01.1987;770.89 ; 891.78 ; 82.57 ; 97.80 ; 97.43 ; 69.37 ; 647.62 ; 65.81
05.01.1987;771.89 ; 898.33 ; 82.82 ; 98.60 ; 98.19 ; 69.16 ; 649.94 ; 65.82
06.01.1987;775.92 ; 902.32 ; 83.28 ; 99.19 ; 98.83 ; 69.50 ; 652.49 ; 66.06
07.01.1987;781.21 ; 899.15 ; 83.78 ; 98.96 ; 98.62 ; 70.59 ; 651.97 ; 66.20
08.01.1987;777.62 ; 887.37 ; 83.52 ; 97.87 ; 97.68 ; 71.01 ; 645.57 ; 65.62
09.01.1987;769.80 ; 868.31 ; 83.03 ; 96.31 ; 96.22 ; 71.40 ; 638.03 ; 65.14
12.01.1987;775.07 ; 879.41 ; 83.64 ; 97.54 ; 97.18 ; 71.50 ; 634.14 ; 65.03
13.01.1987;770.00 ; 872.74 ; 83.00 ; 96.78 ; 96.38 ; 70.97 ; 622.44 ; 63.87
14.01.1987;772.04 ; 876.39 ; 82.99 ; 97.14 ; 96.59 ; 70.66 ; 603.63 ; 62.46
15.01.1987;779.12 ; 884.37 ; 83.77 ; 98.10 ; 97.60 ; 71.28 ; 620.01 ; 63.89
16.01.1987;781.66 ; 883.78 ; 84.15 ; 98.11 ; 97.66 ; 71.95 ; 623.77 ; 64.65
The full dataset can be retrieved from the following url
https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt
I read the file into pandas using the following code.
data=pd.read_csv(txt,encoding='utf8')
I get an by 1 data frame and I now need to separate the columns. I was thinking I could drop the first three rows split the column by ";"and then add the headers back on afterwards. I am trying to use the following function.
data1=pd.Series.str.split(data,pat=';',expand=True)
and this returns
TypeError: len() of unsized object
I tried n=9 as there should be 9 columns but this returns the same error message.
data1=pd.Series.str.split(data,pat=';',n=9, expand=True)
Ive also tried this.
data1 = pd.read_csv(txt,index_col=0,parse_dates=True,sep";",dayfirst=True)
but this returns the error
EmptyDataError: No columns to parse from file
Is that what you want?
import pandas as pd
import io
import requests
url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text.replace(';\n', '\n')),
sep='\s*;\s*',
engine='python',
skiprows=1,
header=[0,1,2],
index_col=0,
parse_dates=True,
dayfirst=True)
Result:
In [266]: df.head()
Out[266]:
Date Blue-Chip Broad Ex UK Ex Euro Zone Blue-Chip Broad
Europe Euro-Zone Europe Euro-Zone Unnamed: 5_level_1 Unnamed: 6_level_1 Nordic Nordic
SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF
1986-12-31 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56
1987-01-01 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56
1987-01-02 770.89 891.78 82.57 97.80 97.43 69.37 647.62 65.81
1987-01-05 771.89 898.33 82.82 98.60 98.19 69.16 649.94 65.82
1987-01-06 775.92 902.32 83.28 99.19 98.83 69.50 652.49 66.06
In [267]: df.shape
Out[267]: (7673, 8)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.