[英]How to loop a function through columns in a data frame and add to new columns
[英]How to loop through series to produce a data frame and add columns to it?
我有一堆要堆疊的系列,制作一個數據幀,並通過相同的過程將其他系列添加到該數據幀。
我已經在 jupyter notebook 中一步一步完成了它,但是當我嘗試在 jupyter notebook 中做一個 for 語句和一個函數來做我可以做的事情(逐步)時,程序失敗給我一個錯誤。
代碼:
import pandas as pd
data = {'sum':[140.0, 45.0, 17907.0], 'mean':[35.00, 11.25, 4476.75],'count':[4, 4, 4]}
df = pd.DataFrame(data, index=['age', 'offspring', 'total_pop'])
data2 = {'sum':[14.0, 46.0, 14607.0], 'mean':[345.00, 121.25, 5476.75], 'count':[2, 2, 2]}
df2 = pd.DataFrame(data2, index=['age', 'offspring', 'total_pop'])
data3 = {'sum':[528.0, 15.0, 1407.0], 'mean':[700.00, 552.25, 4156.75], 'count':[3, 3, 3]}
df3 = pd.DataFrame(data3, index=['age', 'offspring', 'total_pop'])
def dosomething(df):
stacked = df.stack()
df = pd.Series(stacked)
df.to_frame()
dfd = pd.DataFrame(df)
df = df.join(dfd)
print(dfd)
total_df = [(df1), (df2), (df3,)]
for n in range(0, len(total_df)):
total_df[n] = dosomething(total_df[n])
預期:
1 2 3
age sum 140.00 14.00 528.00
mean 35.00 345.00 700.00
count 4.00 2.00 3.00
offspring sum 45.00 46.00 15.00
mean 11.25 121.25 552.25
count 4.00 2.00 3.00
total_pop sum 17907.00 14607.00 1407.00
mean 4476.75 5476.75 4156.75
count 4.00 2.00 3.00
實際錯誤:
ValueError:列重疊但未指定后綴:RangeIndex(start=0, stop=1, step=1)
嘗試concat
:
dfs = [df,df2, df3]
pd.concat([df.stack() for df in dfs], axis=1)
輸出:
0 1 2
age sum 140.00 14.00 528.00
mean 35.00 345.00 700.00
count 4.00 2.00 3.00
offspring sum 45.00 46.00 15.00
mean 11.25 121.25 552.25
count 4.00 2.00 3.00
total_pop sum 17907.00 14607.00 1407.00
mean 4476.75 5476.75 4156.75
count 4.00 2.00 3.00
如果您不需要循環,這將起作用:
# stack dataframes
df = pd.DataFrame(df.stack(), columns=[1])
df2 = pd.DataFrame(df2.stack(), columns=[2])
df3 = pd.DataFrame(df3.stack(), columns=[3])
#join on index
final_df = df.join(df2).join(df3)
如果您需要使用循環,您可以這樣做:
# stack first df
final_df = pd.DataFrame(df.stack(), columns=[1])
# loop through other dfs
for n, next_df in {2: df2, 3: df3}.items():
next_df = pd.DataFrame(next_df.stack(), columns=[n])
final_df = final_df.join(next_df)
您可以使用帶有選項keys
pd.concat
在最終df
上執行單個stack
而不是每個stack
每個df
df_list = [df, df2, df3]
pd.concat(df_list, keys=range(len(df_list)), axis=1).stack()
Out[127]:
0 1 2
age sum 140.00 14.00 528.00
mean 35.00 345.00 700.00
count 4.00 2.00 3.00
offspring sum 45.00 46.00 15.00
mean 11.25 121.25 552.25
count 4.00 2.00 3.00
total_pop sum 17907.00 14607.00 1407.00
mean 4476.75 5476.75 4156.75
count 4.00 2.00 3.00
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.