简体   繁体   English

如何从 Pandas 中的列表向 Dataframe 添加行?

[英]How can I add rows to Dataframe from a list in pandas?

I have a yearly information (COUNT) of countries stored in DataFrame.我有一个存储在 DataFrame 中的国家/地区的年度信息 (COUNT)。 However, some countries are missing in certain years.然而,某些国家在某些年份失踪了。

If I have a complete list of countries, what is an optimal way to add them under corresponding years and fill the missing value for COUNT with 0?如果我有完整的国家/地区列表,将它们添加到相应年份并用 0 填充 COUNT 的缺失值的最佳方法是什么?

            DATE    COUNTRY     COUNTRY_ID  COUNT
       0    1980    United States   840     42
      42    1980    Czech Republic  203     2
      95    1980    Hungary         348     1
      96    1980    Great Britain   826     1
      97    1980    South Africa    710     1
      98    1982    United States   840     42
     140    1982    Paraguay        600     2
       .
       .

One way to do this is to make a combination of all the DATE, COUNTRY combinations and then reindex the DataFrame and finally fill in the missing values.一种方法是组合所有 DATE、COUNTRY 组合,然后reindex DataFrame,最后填充缺失值。

# Assume that we want all years not just the ones seen
years = range(df['DATE'].min(), df['DATE'].max()+1)

# get all combinations
idx = pd.MultiIndex.from_product([years, df['COUNTRY'].unique()], names=['DATE', 'COUNTRY'])

# reindex by first putting DATE and COUNTRY into the index
df1 = df.set_index(['DATE', 'COUNTRY']).reindex(idx).reset_index()

# Fill back in missing IDs
country_map = df.set_index('COUNTRY')['COUNTRY_ID'].drop_duplicates()
df1['COUNTRY_ID'] = df1.COUNTRY.map(country_map)

# fill in 0 for COUNT and convert back to int
df1['COUNT'] = df1['COUNT'].fillna(0).astype(int)

    DATE         COUNTRY  COUNTRY_ID  COUNT
0   1980   United States         840     42
1   1980  Czech Republic         203      2
2   1980         Hungary         348      1
3   1980   Great Britain         826      1
4   1980    South Africa         710      1
5   1980        Paraguay         600      0
6   1981   United States         840      0
7   1981  Czech Republic         203      0
8   1981         Hungary         348      0
9   1981   Great Britain         826      0
10  1981    South Africa         710      0
11  1981        Paraguay         600      0
12  1982   United States         840     42
13  1982  Czech Republic         203      0
14  1982         Hungary         348      0
15  1982   Great Britain         826      0
16  1982    South Africa         710      0
17  1982        Paraguay         600      2

Consider also a cross join merge route (for those of us with the SQL mindset)还考虑一个交叉连接merge路线(对于我们这些有 SQL 思维的人)

# ASSIGN KEY COLUMN
df['KEY'] = 1

# CREATE DF OF DATES RANGE
dates = pd.DataFrame({'DATE':list(range(df['DATE'].min(),df['DATE'].max() + 1)),
                      'COUNT':0, 'KEY':1})    
# CROSS JOIN MERGE
mdf = df.merge(dates, on=['KEY'])

# REASSIGN COUNT
mdf.loc[mdf['DATE_x'] != mdf['DATE_y'], 'COUNT_x'] = 0

# CLEAN UP DF (COLS AND ROWS)
mdf = mdf[['DATE_y', 'COUNTRY', 'COUNTRY_ID', 'COUNT_x']]\
           .rename(columns={'DATE_y':'DATE', 'COUNT_x':'COUNT'})\
           .drop_duplicates(['DATE', 'COUNTRY', 'COUNTRY_ID'])\
           .sort_values('DATE')\
           .reset_index(drop=True)

#     DATE         COUNTRY  COUNTRY_ID  COUNT
# 0   1980   United States         840     42
# 1   1980        Paraguay         600      0
# 2   1980  Czech Republic         203      2
# 3   1980         Hungary         348      1
# 4   1980   Great Britain         826      1
# 5   1980    South Africa         710      1
# 6   1981   United States         840      0
# 7   1981  Czech Republic         203      0
# 8   1981         Hungary         348      0
# 9   1981        Paraguay         600      0
# 10  1981   Great Britain         826      0
# 11  1981    South Africa         710      0
# 12  1982    South Africa         710      0
# 13  1982         Hungary         348      0
# 14  1982  Czech Republic         203      0
# 15  1982   United States         840      0
# 16  1982   Great Britain         826      0
# 17  1982        Paraguay         600      2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将值添加到 Pandas Dataframe 中行的开头和结尾? - How can I add values to the begining and end of rows in a Pandas Dataframe? 如何从 Pandas 中的 DataFrame 中删除 [ a list of ] 行? - How do I delete [ a list of ] rows from a DataFrame in Pandas? 如何将已保存图像列表添加到现有的熊猫数据框中? - How can I add a list of saved images into an existing dataframe in pandas? 如何从 pandas DataFrame 中选择一系列随机行? - How can I select a sequence of random rows from a pandas DataFrame? 如何从 Pandas dataframe 中删除行列表? - How to drop a list of rows from Pandas dataframe? pandas - 从字典列表创建数据框时,如何将行添加为列? - pandas - How can I add row's as a column when creating a dataframe from a list of dicts? 如何复制 Pandas DataFrame 的行? - How can I replicate rows of a Pandas DataFrame? 如何将 .txt 文件中的句子添加到 Pandas 数据框? - How can I add sentences from a .txt file to a pandas dataframe? 将行添加到从列表构建的 pandas dataframe 的最快方法是什么? - What is the quickest way to add rows to a pandas dataframe built from a list? 如何为列表中的一个键创建具有多个值的 Python 字典,然后创建具有一列和多行的 pandas 数据框 - How can I create a Python dictionary with multiple values for one key from a list, to then create a pandas dataframe with one column and multiple rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM