将行追加到数据框

Question

I am stuck on a simple task. 我被困在一个简单的任务上。 I want to create an empty DataFrame and append rows to it based on a query of another dataset. 我想创建一个空的DataFrame并根据另一个数据集的查询向其追加行。 I have tried the answers here but I am missing something ..beginner Pythoner. 我在这里尝试了答案，但我缺少Python初学者。 Any help would be appreciated. 任何帮助，将不胜感激。 I want to take the top 3 rows of each state and add them into a new dataframe for processing. 我想获取每个状态的前3行，并将它们添加到新的数据框中进行处理。 I tried to append also.. 我也尝试附加。

def test():

    #get the list of states
    states_df = census_df.STNAME.unique()
    population_df = pd.DataFrame()

    for st in states_df:
        temp_df = pd.DataFrame(census_df[census_df['STNAME'] == st].nlargest(3,'CENSUS2010POP'))
        pd.concat([temp_df, population_df], ignore_index = True)

    return 1

Answer 1

I think I know what course you're doing, I had a great time with that a year ago, keep it up! 我想我知道您在做什么，一年前我度过了愉快的时光，继续努力！

The simplest/fastest way I've found to concatenate a bunch of sliced dataframes is to append each df to a list, then at the end just concatenate that list. 我发现连接一堆切片数据帧的最简单/最快的方法是将每个df附加到列表中，然后最后仅连接该列表。 See the working code below (it does what I interpret you meant). 请参阅下面的工作代码（它符合我的解释）。

I agree with David's suggestion on sorting, easier to use sort and then just slice the first 3. As nlargest() works on and returns a Series I believe and not a dataframe, whereas you want to keep the whole dataframe structure (all the columns) for concatenation. 我同意David关于排序的建议，使用起来比较容易，然后只对第一个3进行切片。当nlargest（）处理并返回一个我相信的Series而不是一个dataframe时，而您想保留整个dataframe结构（所有列））进行串联。

Also why is your function returning 1? 同样为什么您的函数返回1？ Typo? 错字？ I guess you want to return your desired output if you're putting it in a function, so I changed that too. 我想如果要将其放入函数中，则想返回所需的输出，因此我也进行了更改。

import pandas as pd
import numpy as np


#create fake data random numbers
data = np.random.randint(2,11,(40,3))
census_df = pd.DataFrame(index=range(40), columns=['Blah', 'Blah2','CENSUS2010POP'], data=data)
#create fake STNAME column
census_df['STNAME'] = list('aaaabbbbccccddddeeeeffffgggghhhhiiiijjjj')

#Function:
def test(census_df):
    states_list = census_df.STNAME.unique() #changed naming to _list as it's not a df.
    list_of_dfs = list() #more efficient to append each df to a list
    for st in states_list:
        temp_df = census_df[census_df['STNAME']==st]
        temp_df = temp_df.sort_values(by=['CENSUS2010POP'], ascending=False).iloc[:3]
        list_of_dfs.append(temp_df)
    population_df = pd.concat(list_of_dfs,ignore_index=True)
    return population_df

population_df = test(census_df)

Answer 2

Welcome to SO! 欢迎来到SO！ Is your problem appending or the top three rows? 是附加问题还是前三行？

For append, try the df.append function. 对于追加，请尝试df.append函数。 It could look something like: 它可能看起来像：

#get the list of states
states_df = census_df.STNAME.unique()
population_df = pd.DataFrame()

for st in states_df:
    temp_df = pd.DataFrame(census_df[census_df['STNAME'] == st].nlargest(3,'CENSUS2010POP'))
    population_df = population_df.append(temp_df, ignore_index = True) #append the temp df to your main df, ignoring the index

For the top rows you could us df.sort_values(by=['column name'],ascending=False) and then select the top three rows: 对于最上面的行，您可以使用df.sort_values（by = [''column name']，ascending = False），然后选择最上面的三行：

population_df = population_df.append(temp_df[0:3], ignore_index = True)

将行追加到数据框

问题描述

2 个解决方案

解决方案1
1 2018-09-26 20:53:20

解决方案2
0 2018-09-26 19:23:21

将行追加到数据框

问题描述

2 个解决方案

解决方案1 1 2018-09-26 20:53:20

解决方案2 0 2018-09-26 19:23:21

解决方案1
1 2018-09-26 20:53:20

解决方案2
0 2018-09-26 19:23:21