I am currently trying to read some data from a public API. It has different ways of reading (json, csv, txt, among others), just change the label in the url (/ json, / csv, / txt...). The url is as follows:
https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/ https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/ ...
My problem is that when trying to import into the Pandas dataframe it doesn't read the data correctly. I am trying the following alternatives:
import pandas as pd
import requests
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/'
r = requests.get(url)
rjson = r.json()
df= json_normalize(rjson)
df['periods']
Also I try to read the data in csv format:
import pandas as pd
import requests
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/'
collisions = pd.read_csv(url, sep='<br>')
collisions.head()
But I don't get good results; the dataframe cannot be visualized correctly since the 'periods' column is grouped with all the values...
the output is displayed as follows:
all data appears as columns: /
Here is an example of how the data is displayed correctly:
What alternative do you recommend trying?
Thank you in advance for your time and help !!
I will be attentive to your answers, regards!
For csv
you can use StringIO
from io
package
In [20]: import requests
In [21]: res = requests.get("https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/")
In [22]: import pandas as pd
In [23]: import io
In [24]: df = pd.read_csv(io.StringIO(res.text.strip().replace("<br>","\n")), engine='python')
In [25]: df
Out[25]:
Mes/Año Tipo de cambio - promedio del periodo (S/ por US$) - Bancario - Promedio
0 Jul.2018 3.276595
1 Ago.2018 3.288071
2 Sep.2018 3.311325
3 Oct.2018 3.333909
4 Nov.2018 3.374675
5 Dic.2018 3.364026
6 Ene.2019 3.343864
7 Feb.2019 3.321475
8 Mar.2019 3.304690
9 Abr.2019 3.303825
10 May.2019 3.332364
11 Jun.2019 3.325650
12 Jul.2019 3.290214
13 Ago.2019 3.377560
14 Sep.2019 3.357357
15 Oct.2019 3.359762
16 Nov.2019 3.371700
17 Dic.2019 3.355190
18 Ene.2020 3.327364
19 Feb.2020 3.390350
20 Mar.2020 3.491364
21 Abr.2020 3.397500
22 May.2020 3.421150
23 Jun.2020 3.470167
erh, sorry couldnt find the link for the read json with multiple objects inside it. the thing is we cant use load/s for this kind of format. so have to use raw_decode()
instead
this code should work
import pandas as pd
import json
import urllib.request as ur
from pprint import pprint
d = json.JSONDecoder()
url = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/json/'
#reading and transforming json into list of dictionaries
data = []
with ur.urlopen(url) as json_file:
x = json_file.read().decode() # decode to convert bytes string into normal string
while True:
try:
j, n = d.raw_decode(x)
except ValueError:
break
#print(j)
data.append(j)
x = x[n:]
#pprint(data)
#creating list of dictionaries to convert into dataframe
clean_list = []
for i, d in enumerate(data[0]['periods']):
dict_data = {
"month_year": d['name'],
"value": d['values'][0],
}
clean_list.append(dict_data)
#print(clean_list)
#pd.options.display.width = 0
df = pd.DataFrame(clean_list)
print(df)
result
month_year value
0 Jul.2018 3.27659523809524
1 Ago.2018 3.28807142857143
2 Sep.2018 3.311325
3 Oct.2018 3.33390909090909
4 Nov.2018 3.374675
5 Dic.2018 3.36402631578947
6 Ene.2019 3.34386363636364
7 Feb.2019 3.321475
8 Mar.2019 3.30469047619048
9 Abr.2019 3.303825
10 May.2019 3.33236363636364
11 Jun.2019 3.32565
12 Jul.2019 3.29021428571428
13 Ago.2019 3.37756
14 Sep.2019 3.35735714285714
15 Oct.2019 3.3597619047619
16 Nov.2019 3.3717
17 Dic.2019 3.35519047619048
18 Ene.2020 3.32736363636364
19 Feb.2020 3.39035
20 Mar.2020 3.49136363636364
21 Abr.2020 3.3975
22 May.2020 3.42115
23 Jun.2020 3.47016666666667
if I somehow found the link again, I'll edit/comment my answer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.