I'm having some issues concatenating three DataFrames with Pandas. The rows for one of my DataFrame is not in line with the other two (see code and output below):
import requests
import pandas as pd
from bs4 import BeautifulSoup
List = ['LU0526609390:EUR', 'IE00BHBX0Z19:EUR', 'LU1076093779:EUR', 'LU1116896363:EUR']
df = pd.DataFrame(List, columns=['List'])
urls = 'https://markets.ft.com/data/funds/tearsheet/summary?s='+ df['List']
dfs =[]
results = pd.DataFrame()
for url in urls:
print(url)
r = requests.get(url).content
soup = BeautifulSoup(r, 'html.parser')
elemList = soup.find('title')
df0 = pd.DataFrame(elemList, columns = ['Fund Name'])
df0["Fund Name"] = df0["Fund Name"].str.replace("summary - FT.com", "", regex=True)
table1 = soup.find_all('table')[0]
table2 = soup.find_all('table')[1]
df1 = pd.read_html(str(table1), index_col=0)[0].T
df2 = pd.read_html(str(table2), index_col=0)[0].T
df = pd.concat([df0, df1, df2], axis=1)
dfs.append(df)
pd.concat(dfs).to_csv(r'/Users/Test.csv', index=False)
My Output is the following:
It looks like the rows on my df0 DataFrame (column: 'Fund Name') is not in line with the rows of my other DataFrames. Would be very grateful if someone could let me know why this is happening. Thanks!
Idea is add Fund Name
column like first column in DataFrame.insert
:
dfs =[]
results = pd.DataFrame()
for url in urls:
print(url)
r = requests.get(url).content
soup = BeautifulSoup(r, 'html.parser')
elemList = soup.find('title')
table1 = soup.find_all('table')[0]
table2 = soup.find_all('table')[1]
df1 = pd.read_html(str(table1), index_col=0)[0].T
df2 = pd.read_html(str(table2), index_col=0)[0].T
# print (df2)
df = pd.concat([df1, df2], axis=1)
df.insert(0, 'Fund Name', elemList)
df["Fund Name"] = df["Fund Name"].str.replace("summary - FT.com", "", regex=True)
dfs.append(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.