简体   繁体   中英

How to concatenate three panda DataFrames with no rows mismatch

I'm having some issues concatenating three DataFrames with Pandas. The rows for one of my DataFrame is not in line with the other two (see code and output below):

import requests
import pandas as pd
from bs4 import BeautifulSoup

List = ['LU0526609390:EUR', 'IE00BHBX0Z19:EUR', 'LU1076093779:EUR', 'LU1116896363:EUR']
df = pd.DataFrame(List, columns=['List'])
urls = 'https://markets.ft.com/data/funds/tearsheet/summary?s='+ df['List']

dfs =[]
results = pd.DataFrame()
for url in urls:
    print(url)
    r = requests.get(url).content
    soup = BeautifulSoup(r, 'html.parser')
    elemList = soup.find('title')
    df0 = pd.DataFrame(elemList, columns = ['Fund Name'])
    df0["Fund Name"] = df0["Fund Name"].str.replace("summary - FT.com", "", regex=True)
    table1 = soup.find_all('table')[0]
    table2 = soup.find_all('table')[1]
    df1 = pd.read_html(str(table1), index_col=0)[0].T
    df2 = pd.read_html(str(table2), index_col=0)[0].T
    df = pd.concat([df0, df1, df2], axis=1)
    dfs.append(df)

pd.concat(dfs).to_csv(r'/Users/Test.csv', index=False)    

My Output is the following:

在此处输入图片说明

It looks like the rows on my df0 DataFrame (column: 'Fund Name') is not in line with the rows of my other DataFrames. Would be very grateful if someone could let me know why this is happening. Thanks!

Idea is add Fund Name column like first column in DataFrame.insert :

dfs =[]
results = pd.DataFrame()
for url in urls:
    print(url)
    r = requests.get(url).content
    soup = BeautifulSoup(r, 'html.parser')
    elemList = soup.find('title')
    
    table1 = soup.find_all('table')[0]
    table2 = soup.find_all('table')[1]
    df1 = pd.read_html(str(table1), index_col=0)[0].T
    df2 = pd.read_html(str(table2), index_col=0)[0].T
    # print (df2)
    df = pd.concat([df1, df2], axis=1)
    df.insert(0, 'Fund Name', elemList)
    df["Fund Name"] = df["Fund Name"].str.replace("summary - FT.com", "", regex=True)
    dfs.append(df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM