[英]How is it possible to overwrite dataframe in a list with a for loop in python?
Hello I want to modify my dataframes in a list with a for loop.您好,我想用 for 循环修改列表中的数据框。 My function works well and my dataframes are modified inside the function but once I want to have these new dataframes (with the same name as the old ones), it is not the dataframes that are shown but the old ones.
我的 function 运行良好,我的数据帧在 function 内部进行了修改,但是一旦我想要这些新数据帧(与旧数据帧同名),显示的不是数据帧而是旧数据帧。 I conclude that I can't overwrite my old dataframes.
我得出结论,我无法覆盖我的旧数据框。
All my dataframes in the list are of the form:我在列表中的所有数据框都采用以下形式:
index customer_region number_order distance_between_seller_customer date_last_order mean_days_between_orders mean_item_per_order mean_volume_item_ordered
69 Southeast 1.0 1.850759 736411.0 0.0
74 Southeast 1.0 0.250155 736404.0 0.0
93 Northeast 1.0 20.223906 736416.0 0.0
101 Southeast 1.0 0.989547 736366.0 0.0
Preparation准备
all_dfs = [sample_1strim, sample_2strim, sample_3strim]
function to normalized digital columns, encoded nominal column to digital columns and modify old dataframe per dataframe merge of previous modification: function 到标准化的数字列,将标称列编码到数字列,并根据 dataframe 修改旧的 dataframe 合并先前的修改:
def get_df_name(df): #to get dataframe name
name =[x for x in globals() if globals()[x] is df][0]
return name
def standartization_encodage (frame):
X = frame.copy()
categorical_columns = X.select_dtypes(['category','object']).columns
numerical_columns = X.select_dtypes(['int64','float64']).columns
X[numerical_columns] = StandardScaler().fit_transform(X[numerical_columns])
one_hot_encoded = pd.get_dummies(X[categorical_columns])
X = pd.merge(X[numerical_columns], one_hot_encoded,left_index = True, right_index = True)
X.shape
X = X.set_index(frame.index)
return X
for i, df in enumerate(all_dfs):
all_dfs[i] = standartization_encodage(df)
Thanks in advance.提前致谢。
I would just use a function and apply it to all elements in the list of dataframes.我只会使用 function 并将其应用于数据帧列表中的所有元素。
def transform_df(df):
# your code in the loop
return transformed_df
all_dfs = [transform_df(df) for df in all_dfs]
I may be wrong, but I think you have misunderstood how variables and lists work in Python.我可能错了,但我认为您误解了变量和列表在 Python 中的工作方式。 You have
你有
df1 = (some dataframe)
all_dfs = [df1]
This means that you have a dataframe, and that the variable df1
, as well as the first list entry, hold references to that dataframe:这意味着您有一个 dataframe,并且变量
df1
以及第一个列表条目包含对该 dataframe 的引用:
df1 -------> (ORIGINAL DATAFRAME)
^
|
all_dfs ---> [ 0 ]
Then, you effectively do然后,你有效地做
df1_1 = all_dfs[0].copy()
df1_1 = (stuff that modifies df1_1)
all_dfs[0] = df1_1
This means that you have created a new dataframe, and changed the first entry of the list to refer to this new dataframe.这意味着您创建了一个新的 dataframe,并将列表的第一个条目更改为引用这个新的 dataframe。 However, the variable
df1
still refers to the original dataframe!但是,变量
df1
仍然指的是原始数据框!
df1 -------> (ORIGINAL DATAFRAME)
all_dfs ---> [ 0 ]
|
v
(NEW DATAFRAME)
You cannot change a variable indirectly through a list as (I think) you are trying to do.您不能像(我认为)您正在尝试做的那样通过列表间接更改变量。 You might be able to get the same effect if, instead of creating a new df, you can modify the existing one in place (unfortunately, I am not familiar enough with pandas to know off hand whether that is possible).
如果您可以修改现有的 df 而不是创建新的 df,您可能可以获得相同的效果(不幸的是,我对 pandas 不够熟悉,无法立即知道这是否可能)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.