I am trying out Python scraping for the first time so I'm kind of patching up codes from all over the place.
Right now I have encountered 2 issues that I do not know how to solve:
My tbl
list outputs to test.csv
into only the first cell and is not delimited as well even though I've specified the conditions in .writer()
The output into the CSV file has some encoding issues even though I don't see any when I output on my Python shell.
I am currently using Python 2.7
import urllib2
from bs4 import BeautifulSoup
import csv
import pandas as pd
site= "https://www.investing.com/currencies/usd-sgd-forward-rates"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
px_table = str(soup.find('table', attrs={'id':'curr_table'}))
print type(px_table)
tbl = pd.read_html(px_table, encoding='utf-8')
with open('test.csv', 'w') as myFile:
wr = csv.writer(myFile, delimiter=' ')
wr.writerow(tbl)
output:
Unnamed: 0 Name Bid Ask High Low Chg. Time
0 NaN USDSGDÂ ONÂ FWD -0.85 0.15 -0.29 -1.19 0.75 9:40:00
1 NaN USDSGDÂ TNÂ FWD -0.50 -0.45 -0.35 -0.45 -0.08 9:43:00
2 NaN USDSGDÂ SNÂ FWD -0.30 -0.20 -0.29 -0.21 0.10 9:42:00
3 NaN USDSGDÂ SWÂ FWD -2.17 -1.69 -1.80 -1.80 -0.16 9:42:00
4 NaN USDSGDÂ 2WÂ FWD -5.32 -1.72 -3.58 -3.44 -1.22 9:43:00
5 NaN USDSGDÂ 3WÂ FWD -6.15 -4.35 -5.12 -5.17 -0.30 9:42:00
6 NaN USDSGDÂ 1MÂ FWD -8.53 -7.74 -8.00 -8.10 -0.17 9:42:00
7 NaN USDSGDÂ 2MÂ FWD -15.81 -14.81 -14.75 -15.15 -0.25 9:43:00
8 NaN USDSGDÂ 3MÂ FWD -25.00 -24.07 -23.53 -24.07 -0.40 9:42:00
9 NaN USDSGDÂ 4MÂ FWD -35.72 -27.72 -32.16 -32.37 -1.18 9:43:00
10 NaN USDSGDÂ 5MÂ FWD -46.53 -35.47 -40.00 -40.96 -2.41 9:42:00
11 NaN USDSGDÂ 6MÂ FWD -50.83 -48.67 -48.75 -50.00 0.94 9:42:00
12 NaN USDSGDÂ 7MÂ FWD -65.77 -53.06 -59.68 -58.69 -3.27 9:43:00
13 NaN USDSGDÂ 8MÂ FWD -79.41 -59.65 -66.98 -69.70 -6.61 9:42:00
14 NaN USDSGDÂ 9MÂ FWD -84.51 -73.85 -74.05 -79.19 -1.84 9:42:00
15 NaN USDSGDÂ 10MÂ FWD -102.16 -75.06 -85.01 -87.28 -9.66 9:43:00
16 NaN USDSGDÂ 11MÂ FWD -109.81 -84.92 -96.50 -96.31 -7.91 9:43:00
17 NaN USDSGDÂ 1YÂ FWD -107.88 -103.13 -104.47 -107.00 2.63 9:43:00
18 NaN USDSGDÂ 15MÂ FWD -140.08 -106.19 -132.00 -121.00 6.92 9:40:00
19 NaN USDSGDÂ 21MÂ FWD -200.00 -151.00 -185.50 -180.50 14.00 9:40:00
20 NaN USDSGDÂ 2YÂ FWD -196.50 -121.50 -162.40 -197.50 50.50 9:40:00
21 NaN USDSGDÂ 3YÂ FWD -355.00 -306.00 -347.00 -330.00 20.00 9:43:00
22 NaN USDSGDÂ 4YÂ FWD 145.00 211.00 0.00 0.00 1.00 31/07
23 NaN USDSGDÂ 5YÂ FWD 117.00 187.00 0.00 0.00 -4.00 31/07
24 NaN USDSGDÂ 7YÂ FWD 63.00 189.00 0.00 0.00 -1.00 31/07
25 NaN USDSGDÂ 10YÂ FWD -30.00 127.00 0.00 0.00 10.00 31/07
You should use Pandas to_csv()
function to write your table. You can also specify a file encoding such as utf-8
for the file:
import urllib2
from bs4 import BeautifulSoup
import pandas as pd
site = "https://www.investing.com/currencies/usd-sgd-forward-rates"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site, headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page, "lxml")
px_table = str(soup.find('table', attrs={'id':'curr_table'}))
df_table = pd.read_html(px_table, encoding='utf-8')[0]
del df_table['Unnamed: 0']
df_table.to_csv('test.csv', encoding='utf-8', index=False)
This would give you a test.csv
starting like:
Name,Bid,Ask,High,Low,Chg.,Time
USDSGD ON FWD,-1.35,0.65,-0.29,-1.19,0.25,12:10:00
USDSGD TN FWD,-0.54,-0.46,-0.35,-0.49,-0.12,11:11:00
USDSGD SN FWD,-0.43,-0.14,-0.29,-0.25,-0.03,12:11:00
USDSGD SW FWD,-1.99,-1.51,-1.8,-1.8,0.02,12:10:00
USDSGD 2W FWD,-5.63,-1.53,-3.58,-3.44,-1.53,12:11:00
This code also removes the unwanted Unnamed: 0
column, and disables the writing of an index column to the CSV file.
Alternatively, you could remove the need for BeautifulSoup as read_html()
will return a list of data frames for all tables that it is able to find:
import urllib2
import pandas as pd
site = "https://www.investing.com/currencies/usd-sgd-forward-rates"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site, headers=hdr)
page = urllib2.urlopen(req)
df_table = pd.read_html(page.read(), encoding='utf-8')[1]
df_table.drop(df_table.columns[[0]], axis=1, inplace=True)
df_table['Name'] = df_table['Name'].str.encode('ascii', errors='ignore')
df_table.to_csv('test.csv', encoding='ascii', index=False)
This approach also forces the conversion of the Name
column to be ASCII.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.