简体   繁体   中英

Add a new column to a dataframe in which each row adopts a different value based on the title of the dataframe it came from

So i have a list of multiple dataframes, and I concadenated them in one big dataframe. Now I want to add a column to this last big dataframe, but I want the values of this column to change depending on the name of the dataframe each row belongs to in the first place. This is an example:

list_of_df = [march_01, march_02, march_03]
big_df = pd.concat([march_01, march_02, march_03], ignore_index=True)

big_df['new_column'] = # i want this column to adopt the value '01' for those rows that originally belong
                       # to the march_01 dataframe, the value '02' for those rows that originally belong 
                       # to the march_02 dataframe, and so on.

one way:

import itertools as it

big_df["new_column"] = list(it.chain.from_iterable([f"{j}".zfill(2)]*len(df)
                                                   for j, df in enumerate(list_of_df, start=1)))

This gets the length of each df and repeats the "0x" part that many times. chain then glues them together.

another way:

import numpy as np

lengths = list(map(len, list_of_df))
starting_points = [0, *np.cumsum(lengths)[:-1]]
big_df.loc[starting_points, "new_column"] =  [f"{j}".zfill(2)
                                              for j, _ in enumerate(list_of_df, start=1)]
big_df["new_column"].ffill(inplace=True)

This first determines the starting points of the df's in the big df by the cumulative sum of the length of df's (discarding last one's length since it is immaterial for its starting point and prepending a 0 for the first one). Then puts "0x" for those points and finally forward fills the remaining NaN s.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM