简体   繁体   中英

Splitting up a Master DataFrame to mutliple DataFrames in a loop- Pandas

I have a master data Frame-

NHE_17.head()

Out[42]: 
                         Var name     1960     1961     1962     1963  
0  Total National Health Expenditures  27214.0  29138.0  31842.0  34595.0   
1                       Out of pocket  12949.0  13357.0  14255.0  15311.0   
2                    Health Insurance   7497.0   8236.0   8999.0   9892.0   
3            Private Health Insurance   5812.0   6468.0   7178.0   7952.0   
4                            Medicare      0.0      0.0      0.0      0.0   

I am trying to split this data frame into mutliple dataframes based on indices passed in a loop:

def slice(idx):
    df_temp= NHE_17.iloc[idx[0]:idx[1]]
    return df_temp

df_list_idx = [['df_1',[0,37]],['df_2',[280,310]]]

for df_name, idx in df_list_idx:
    df = slice(idx)
    df_name= df

So ideally, I want 'df_1' to be assigned to NHE_17.iloc[0:37], df_2 to NHE_17.iloc[280:310] and so on...

But that is not happening. df_name retains the dataframe sliced with the last indices passed([280:310] in this case) and doesn't assign to the 'df_name' as it should in the last line within the for loop:

df_name= df 

This is not related to pandas, or dataframes, but a basic programming issue. You're trying to assign a variable to a string. That is:

'a' = 2 # example
'df_1' = df # what you are trying to do in essence. 

Python, or any language that I know if will not let you do this, because a string (eg 'df_1' ) is not a valid variable name.

Instead, I think the best way to do this is by adding the slices to a list.

df_list_idx = [[0,37],[280,310]]
data = []
for idx in df_list_idx:
    df = slice(idx)
    data.append(df)

Now you could index within the data variable. If instead you have many more indecisive, you would probably not want to create many more variables anyway.

df_1 = data[0]
df_2 = data[1]

We can create a dictionary of DataFrames dfs , with keys from the list ('df_1', 'df_2', ...).

Then it's just a loop that populates this dictionary:

df = pd.DataFrame({'a': range(500)})

df_list_idx = [['df_1',[0,3]],['df_2',[280,284]]]
dfs = {}
for x in df_list_idx:
    k = x[0] # e.g. 'df_1'
    v = x[1] # e.g. [0,3]
    dfs[k] = df.iloc[v[0]:v[1]]

print(dfs['df_1'])
print(dfs['df_2'])

Output:

   a
0  0
1  1
2  2
       a
280  280
281  281
282  282
283  283

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM