如何為唯一 ID 運行多個線性模型，並通過唯一 ID 將結果放入單個 dataframe 中？

Question

如何將 dataframe 中唯一 ID 的回歸截距和系數數據獲取到單個 dataframe 中，其中每行都有 UID、截距和系數？

這是我的原始數據的一個片段。 未來的數據可以有更多的 UID 和更多的字段（自變量）。

用戶標識符	A1	A2	A3	A4	評分
1	0.377489423	0.950311846	0.892135293	0.077054085	4
1	0.595570737	0.824334482	0.388634543	0.947936483	4
1	0.585703124	0.825486315	0.569809886	0.321117521	3
1	0.386968371	0.594556911	0.260187376	0.394238102	4
1	0.532731866	0.219741858	0.865710517	0.173044631	3
1	0.16565561	0.125096015	0.881841651	0.494690133	4
2	0.42418965	0.814894214	0.989426645	0.871014023	1
2	0.742604257	0.571780036	0.247811255	0.468820653	2
2	0.401989919	0.375134173	0.539599593	0.443260146	3
2	0.167910365	0.940073739	0.490081723	0.803074574	5
2	0.614160221	0.045817359	0.077645469	0.367456074	4
3	0.866397055	0.2932472	0.968410252	0.348542304	5
3	0.141680391	0.998446121	0.201506356	0.689863785	1
3	0.407182414	0.721650663	0.174277013	0.922810374	1

這是我編寫的代碼，用於遍歷每個唯一的 UID 並運行線性 model 並將每個 UID 的截距和系數添加到列表中。

ids = df.UID.unique()

op=[]
for i in ids:
    df_i = df[df.UID == i]
    X =df_i.drop(['UID','Rating'], axis=1)
    y= df_i['Rating']
    reg = LinearRegression().fit(X, y)
    reg.score(X, y)
    const = reg.intercept_
    coef = reg.coef_
    op.append(const)
    op.append(coef)
op

我希望我的 output 看起來像這種格式（顯示的數據是虛擬數據）。 所以每一行都有 UID、截距和線性回歸系數。 這就是我卡住的地方。

用戶標識符	截距	A1	A2	A3	A4
1	3.2343	0.950311846	0.892135293	0.077054085	4.3454
2	2.123	0.824334482	0.388634543	0.947936483	2.3454
3	3.455	0.825486315	0.569809886	0.321117521	3.12343

也可以隨意評論獲取回歸模型的初始方法。

謝謝

Answer 1

請參閱以下更改：

ids = df.UID.unique()

op=pd.DataFrame()

for i in ids:
    
    df_i = df[df.UID == i]
    X =df_i.drop(['UID','Rating'], axis=1)
    y= df_i['Rating']
    reg = LinearRegression().fit(X, y)
    reg.score(X, y)
    const = reg.intercept_
    coef = reg.coef_
    uid=i
    array=np.append(coef,const)
    array=np.append(array,uid)
    array=array.reshape(1,len(array))
    df_append=pd.DataFrame(array)
    op=op.append(df_append)

op.columns=['A'+str(i) for i in range (1,len(op.columns)+1)]
op.rename(columns={op.columns[-1]:"UID"},inplace=True)
op.rename(columns={op.columns[-2]:"Intercept"},inplace=True)
op=op.reset_index().drop('index',axis=1)
op=op.drop_duplicates()

Answer 2

這是我想出的/。 我只需要添加 UID，不知道如何為每一行添加它。

ids = df.UID.unique()

op = pd.DataFrame
intercept = []
coefficients=[]
UID = []
for i in ids:
    df_i = df[df.UID == i]
    X =df_i.drop(['UID','Rating'], axis=1)
    y= df_i['Rating']
    reg = LinearRegression().fit(X, y)
    reg.score(X, y)
    unique_id=df_i['UID'].unique()   
    const = reg.intercept_
    coef = reg.coef_
    UID.append(unique_id)
    intercept.append(const)
    coefficients.append(coef)

intercep_new = pd.DataFrame(intercept)
coefficients_new = pd.DataFrame(coefficients)
UID_new = pd.DataFrame(UID)

colNames = df.drop(['Rating',], axis=1).columns
colNames = colNames.insert(1, 'Const')
colNames

op = pd.concat([UID_new,intercep_new, coefficients_new], axis=1)
op.columns = colNames

如何為唯一 ID 運行多個線性模型，並通過唯一 ID 將結果放入單個 dataframe 中？

問題描述

2 個解決方案

解決方案1
0 2021-01-08 17:43:14

解決方案2
0 2021-01-08 19:13:44

如何為唯一 ID 運行多個線性模型，並通過唯一 ID 將結果放入單個 dataframe 中？

問題描述

2 個解決方案

解決方案1 0 2021-01-08 17:43:14

解決方案2 0 2021-01-08 19:13:44

解決方案1
0 2021-01-08 17:43:14

解決方案2
0 2021-01-08 19:13:44