简体   繁体   English

使用 Pandas,如何将现有数据帧行添加到具有现有值的另一个数据帧行?

[英]Using Pandas, how would I add an existing dataframe row to another dataframe row with existing values?

For context, I have a dataset that is comprised of USA's states and territories.就上下文而言,我有一个由美国各州和领地组成的数据集。 I have made a new data frame with only the 50 states(excluding territories) lets call it States_Only.我制作了一个只有 50 个州(不包括领土)的新数据框,我们称之为 States_Only。 This is complete.这是完整的。 However, the first data set (lets call it USA_ALL) had both NY and NYC as independent rows, meaning that the values attributed to NY do not already include NYC's recorded data.但是,第一个数据集(我们称之为 USA_ALL)将 NY 和 NYC 作为独立行,这意味着归属于 NY 的值尚未包括NYC 的记录数据。 Because they originated from the same data set the columns match.因为它们源自相同的数据集,所以列匹配。 All values are either NAN/NULL or integers.所有值都是 NAN/NULL 或整数。 For my States_Only data to be complete, the NYC values from USA_ALL need to be added to NY in the States_only dataframe.为了使我的 States_Only 数据完整,需要将来自 USA_ALL 的 NYC 值添加到 States_only 数据框中的 NY。 How can I achieve this?我怎样才能做到这一点? For clarity, I do not want to append NYC, nor do I have the ability to groupby() because there is nothing software side tying these two together(such as an identifier), only the knowledge that NYC is within NY.为清楚起见,我不想附加 NYC,也没有 groupby() 的能力,因为没有任何软件方面将这两者联系在一起(例如标识符),只有纽约市在纽约市内的知识。

import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

if __name__ == '__main__':
    #data prep
    data_path = './assets/'
    out_path = './output'
    #scraping javascript map data via xml
    endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
    data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
    #convert to df and export raw data as csv
    df = pd.DataFrame(data["US_MAP_DATA"])
    path = os.path.join(out_path,'Raw_CDC_Data.csv')
    df.to_csv(path)

    #Remove last data point (Total USA)
    df.drop(df.tail(1).index,inplace=True)
    #Create DF of just 50 states
    state_abbr =["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]


    states = df[df['abbr'].isin(state_abbr)]
    # Add NYC from df to NY's existing values (sum of each column) to states

here is an excel spreadsheat to show the expected final value in the States_only dataset, this is included because the formatting on this forum for this data would be hard to understand and unclear Expected Values这是一个 excel spreadsheat,用于显示 States_only 数据集中的预期最终值,包括在内是因为此论坛上此数据的格式很难理解且预期值不清楚

While this isn't super clean, it will do the trick:虽然这不是超级干净,但它可以解决问题:

import pandas as pd

import requests

endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()

df = pd.DataFrame(data["US_MAP_DATA"])

# drop last row
df = df[:-1]

ny_rows_mask = df["abbr"].isin(["NY", "NYC"])

ny_rows = df.loc[ny_rows_mask]

df = df.loc[~ny_rows_mask]

new_row = ny_rows.sum()
new_row["abbr"] = "NY"
new_row["id"] = 36
new_row["fips"] = 36
new_row["name"] = "New York"

df = df.append(new_row, ignore_index=True)

As an aside, if you haven't already you should examine some of the data types that Pandas infers from the CSV.顺便说一句,如果您还没有检查过 Pandas 从 CSV 推断出的一些数据类型。 The id column probably shouldn't be a number type, for example.例如, id列可能不应该是数字类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Pandas DataFrame 将新行添加到现有 CSV - Add new row to existing CSV using Pandas DataFrame 使用 Pandas 将新行附加到现有 Dataframe - Appending New Row to Existing Dataframe using Pandas Pandas Dataframe 覆盖现有行 - Pandas Dataframe override existing row 使用来自另一个数据帧的行号,从现有数据帧创建新的 Pandas 数据帧 - Create new pandas dataframe from existing dataframe, using row numbers from another dataframe 熊猫:如何将行追加到按日期索引的现有数据框? - Pandas: How can I append a row to an existing dataframe that is indexed by date? 将行值转置为 Pandas 数据框中现有的预定义列 - Transpose row values into existing predefined columns in pandas dataframe 将新的列元素显式添加到Pandas DataFrame(Python 2)中的现有行 - Add new column elements to explicitly to existing row in Pandas DataFrame (Python 2) 将pandas数据框中的日期时间添加到基于列的现有行 - Add datetime in pandas dataframe to existing row based on column Pandas DataFrames:如何根据另一个数据帧列中的值使用现有数据帧中的索引值定位行? - Pandas DataFrames: How to locate rows using index values in existing dataframe based on values from another dataframe column? 如何将列值与每行多个值的另一个数据框中的行值匹配? - How would I match column values to row values in another dataframe with multiple values per row?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM