簡體   English   中英

我無法在 dataframe 中添加兩列

[英]I am unable to add two columns in a dataframe

我正在嘗試在 dataframe 中添加兩列。 我無法檢查他們的屬性。 我怎樣才能了解它?

import re
import textwrap
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas import DataFrame

URL = "https://www.soccerbase.com/teams/team.sd?team_id=536&comp_id=1&teamTabs=results"
# URL = "http://worldpopulationreview.com/countries/countries-by-gdp/"
r = requests.get(URL)

# soup = BeautifulSoup(r.content, 'html.parser')
soup = BeautifulSoup(r.content, 'html.parser')

table = soup.find('table', {'class': 'soccerGrid'})


def rowgetdatatext(tr, coltag='td', true=None):  # td (data) or th (header)
    cols = []
    for td in tr.find_all(coltag):
        cols.append(td.get_text(strip=true))
    return cols


def tabledatatext(table):
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetdatatext(trs[0], 'th')
    if headerow:  # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs:  # for every table row
        rows.append(rowgetdatatext(tr, 'td'))  # data row
    return rows


d = tabledatatext(table)

pd.set_option('display.width', 400)
pd.set_option('display.max_columns', 20)
df = pd.DataFrame(d)

Frame = pd.DataFrame(df.values,
                     columns=["Competition", "Date", "Omit", "Home Team", "Score", "Away Team", "Omit",
                              "Omit", "Omit", "DateKeep"
                              ])
Frame = Frame.drop(columns=["Omit", "Date"])
Frame = Frame.drop([0, 1], axis=0)

Frame[['Home Score', 'Away Score']] = Frame['Score'].str.split('-', expand=True)
Frame = Frame.drop(columns="Score")
Frame = Frame[["Competition", "Home Team", "Home Score", "Away Team", "Away Score",
               "DateKeep"]]

Frame['Home Team'] = Frame['Home Team'].str[:-20]
Frame['Away Team'] = Frame['Away Team'].str[:-20]
Frame['DateKeep'] = Frame['DateKeep'].str[3:]
Frame['Competition'] = Frame['Competition'].str[:-18]

# Frame['Home Score'] = Frame['Home Score'].str.split()
# Frame['Away Score'] = Frame['Away Score'].str.split()

# pd.to_numeric(Frame['Away Score'], errors='coerce')
F2 = Frame.index(Frame)
print(Frame)
print(F2)

樣品 output 是:

比賽 主隊 主隊 客隊 客場 比分 DateKeep 2 英超 英超 曼聯 4 切爾西 0 2019-08-11 16:30 3 歐洲超級杯 歐洲超級杯 利物浦 2 切爾西 2 2019-08-14 20:00 4 英超英超切爾西 1 萊斯特 1 2019-08-18 16:30 5 英超 英超 諾維奇 2 切爾西 3 2019-08-24 12:30 6 英超 英超 切爾西 2 謝夫聯 2 2019-08-31 15:00 7英超 英超 狼隊 2 切爾西 5 2019-09-14 15:00

如果我想添加 Home Score 和 Away Score 列,它會連接這兩個值而不是相加。 我在哪里沒有得到它? 謝謝編輯:添加當前 output 和所需 output 的屏幕截圖在此處輸入圖像描述

我將使用其中一個分數列的長度來收集每列感興趣的值的列表,以將其他列列表大小限制為相同。 然后 zip 這些列表並轉換為 df。 如果您已將先前的分數列轉換為整數,則可以計算最后兩列。

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

headers = ['Competition','Home Team','Home Score','Away Team','Away Score','Date Keep','Total Score (Home + Away Score)','Goal Difference (Home - Away Score)']
r = requests.get('https://www.soccerbase.com/teams/team.sd?team_id=536&comp_id=1&teamTabs=results')
soup = bs(r.content, 'lxml')

h_scores = [int(i.text) for i in soup.select('.score a em:first-child')]
a_scores = [int(i.text) for i in soup.select('.score a em + em')]
total_scores = [h+a for h,a in zip(h_scores, a_scores)]
diff_scores = [h-a for h,a in zip(h_scores, a_scores)]
limit = len(a_scores)
comps = [i.text for i in soup.select('.tournament a', limit=limit)]
dates = [i.text for i in soup.select('.dateTime .hide', limit=limit)]
h_teams = [i.text for i in soup.select('.homeTeam a', limit=limit)]
a_teams = [i.text for i in soup.select('.awayTeam a', limit=limit)]

df = pd.DataFrame(zip(comps, h_teams, h_scores, a_teams, a_scores, dates, total_scores, diff_scores), columns = headers)
print(df)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM