简体   繁体   English

Python / Pandas数据框:自动填写缺少的行

[英]Python / Pandas Dataframe: Automatically fill in missing rows

My goal is to ultimately create a scatter plot with date on the x-axis and won delegates (of each candidate) on the y-axis. 我的目标是最终创建一个在x轴上带有日期的散点图,并在y轴上赢得(每个候选人的)代表。 I'm unsure of how to "fill in the blanks" when it comes to missing dates. 对于不确定的日期,我不确定如何“填补空白”。 I've attached a picture of the table I get. 我已经附上一张我得到的桌子的照片。

For example, I'm trying to put March 1 as the date for Alaska, Arkansas, etc. to make it possible to plot the data. 例如,我试图将3月1日作为阿拉斯加,阿肯色州等的日期,以便可以绘制数据。

# CREATE DATAFRAME WITH DELEGATE WON/TARGET INFORMATION

import requests 
from lxml import html 
import pandas 

url = "http://projects.fivethirtyeight.com/election-2016/delegate-targets/"
response = requests.get(url)
doc = html.fromstring(response.text)

tables = doc.findall('.//table[@class="delegates desktop"]')
election = tables[0] 
election_rows = election.findall('.//tr')
def extractCells(row, isHeader=False):
    if isHeader:
        cells = row.findall('.//th')
    else:
        cells = row.findall('.//td')
    return [val.text_content() for val in cells]


def parse_options_data(table):

    rows = table.findall(".//tr")
    header = extractCells(rows[1], isHeader=True)
    data = [extractCells(row, isHeader=False) for row in rows[2:]]

    trumpdata = "Trump Won Delegates"
    cruzdata = "Cruz Won Delegates"
    kasichdata = "Kasich Won Delegates"

    data = pandas.DataFrame(data, columns=["Date", "State or Territory", "Total Delegates", trumpdata, cruzdata, kasichdata, "Rubio"])

    data.insert(4, "Trump Target Delegates", data[trumpdata].str.extract(r'(\d{0,3}$)'))
    data.insert(6, "Cruz Target Delegates", data[cruzdata].str.extract(r'(\d{0,3}$)'))
    data.insert(8, "Kasich Target Delegates", data[kasichdata].str.extract(r'(\d{0,3}$)'))

    data = data.drop('Rubio', 1)
    data[trumpdata] = data[trumpdata].str.extract(r'(^\d{0,3})')
    data[cruzdata] = data[cruzdata].str.extract(r'(^\d{0,3})')
    data[kasichdata] = data[kasichdata].str.extract(r'(^\d{0,3})')

    return df

election_data = parse_options_data(election)
df = pandas.DataFrame(election_data)
df

我桌子的图片

You could do, 你可以做

 data.fillna('March 1')

I would advise you to go through the documentation 我建议您仔细阅读文档

http://pandas.pydata.org/pandas-docs/stable/10min.html http://pandas.pydata.org/pandas-docs/stable/10min.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM