简体   繁体   English

如何用 python 中的 for 循环覆盖列表中的 dataframe?

[英]How is it possible to overwrite dataframe in a list with a for loop in python?

Hello I want to modify my dataframes in a list with a for loop.您好,我想用 for 循环修改列表中的数据框。 My function works well and my dataframes are modified inside the function but once I want to have these new dataframes (with the same name as the old ones), it is not the dataframes that are shown but the old ones.我的 function 运行良好,我的数据帧在 function 内部进行了修改,但是一旦我想要这些新数据帧(与旧数据帧同名),显示的不是数据帧而是旧数据帧。 I conclude that I can't overwrite my old dataframes.我得出结论,我无法覆盖我的旧数据框。

All my dataframes in the list are of the form:我在列表中的所有数据框都采用以下形式:


index  customer_region  number_order    distance_between_seller_customer    date_last_order mean_days_between_orders    mean_item_per_order mean_volume_item_ordered    
69  Southeast   1.0 1.850759    736411.0    0.0 
74  Southeast   1.0 0.250155    736404.0    0.0 
93  Northeast   1.0 20.223906   736416.0    0.0 
101 Southeast   1.0 0.989547    736366.0    0.0

Preparation准备

all_dfs = [sample_1strim, sample_2strim, sample_3strim]

function to normalized digital columns, encoded nominal column to digital columns and modify old dataframe per dataframe merge of previous modification: function 到标准化的数字列,将标称列编码到数字列,并根据 dataframe 修改旧的 dataframe 合并先前的修改:


def get_df_name(df):    #to get dataframe name
    name =[x for x in globals() if globals()[x] is df][0]
    return name

def standartization_encodage (frame):
  X = frame.copy()
  categorical_columns = X.select_dtypes(['category','object']).columns
  numerical_columns = X.select_dtypes(['int64','float64']).columns
  X[numerical_columns] = StandardScaler().fit_transform(X[numerical_columns])
  one_hot_encoded = pd.get_dummies(X[categorical_columns])
  X = pd.merge(X[numerical_columns], one_hot_encoded,left_index = True, right_index = True)
  X.shape
  X = X.set_index(frame.index)

  return X

            
for i, df in enumerate(all_dfs):
    all_dfs[i] = standartization_encodage(df)

Thanks in advance.提前致谢。

I would just use a function and apply it to all elements in the list of dataframes.我只会使用 function 并将其应用于数据帧列表中的所有元素。

def transform_df(df):
    # your code in the loop
    return transformed_df

all_dfs = [transform_df(df) for df in all_dfs]

I may be wrong, but I think you have misunderstood how variables and lists work in Python.我可能错了,但我认为您误解了变量和列表在 Python 中的工作方式。 You have你有

df1 = (some dataframe)
all_dfs = [df1]

This means that you have a dataframe, and that the variable df1 , as well as the first list entry, hold references to that dataframe:这意味着您有一个 dataframe,并且变量df1以及第一个列表条目包含对该 dataframe 的引用:

df1 -------> (ORIGINAL DATAFRAME)
               ^
               |
all_dfs ---> [ 0 ]

Then, you effectively do然后,你有效地做

df1_1 = all_dfs[0].copy()
df1_1 = (stuff that modifies df1_1)
all_dfs[0] = df1_1

This means that you have created a new dataframe, and changed the first entry of the list to refer to this new dataframe.这意味着您创建了一个新的 dataframe,并将列表的第一个条目更改为引用这个新的 dataframe。 However, the variable df1 still refers to the original dataframe!但是,变量df1仍然指的是原始数据框!

df1 -------> (ORIGINAL DATAFRAME)

all_dfs ---> [ 0 ]
               |
               v
              (NEW DATAFRAME)

You cannot change a variable indirectly through a list as (I think) you are trying to do.您不能像(我认为)您正在尝试做的那样通过列表间接更改变量。 You might be able to get the same effect if, instead of creating a new df, you can modify the existing one in place (unfortunately, I am not familiar enough with pandas to know off hand whether that is possible).如果您可以修改现有的 df 而不是创建新的 df,您可能可以获得相同的效果(不幸的是,我对 pandas 不够熟悉,无法立即知道这是否可能)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM