简体   繁体   中英

How to create a new python DataFrame with multiple columns of differing row lengths?

I'm organizing a new dataframe in order to easily insert data into a Bokeh visualization code snippet. I think my problem is due to differing row lengths, but I am not sure.

Below, I organized the dataset in alphabetical order, by country name, and created an alphabetical list of the individual countries. new_data.tail() Although Zimbabwe is listed last, there are 80336 rows, hence the sorting.

    df_ind_data = pd.DataFrame(ind_data)
    new_data = df_ind_data.sort_values(by=['country'])
    new_data = new_data.reset_index(drop=True)
    country_list = list(ind_data['country'])
    new_country_set = sorted(set(country_list))

My goal is create a new DataFrame, with 76 cols (country names), with the specific 'trust' data in the rows underneath each country column.

df = pd.DataFrame()
for country in new_country_set:
    pink = new_data.loc[(new_data['country'] == country)]
    df[country] = pink.trust

Output here

As you can see, the data does not get included for the rest of the columns after the first. I believe this is due to the fact that the number of rows of 'trust' data for each country varies. While the first column has 1000 rows, there are some with as many as 2500 data points, and as little as 500.

I have attempted a few different methods to specify the number of rows in 'df', but to no avail.

The visualization code snippet I have utilizes this same exact data structure for the template data, so that it why I'm attempting to put it in a dataframe. Plus, I can't do it, so I want to know how to do it.

Yes, I can put it in a dictionary, but I want to put it in a dataframe.

You should use combine_first when you add a new column so that the dataframe index gets extended. Instead of

df[country] = pink.trust

you should use

df = pink.trust.combine_first(df)

which ensures that your index is always union of all added columns.

I think in this case pd.pivot(columns = 'var', values = 'val') , will work for you, especially when you already have dataframe. This function will transfer values from particular column into column names. You could see the documentation for additional info. I hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM