[英]Copying (assembling) the column from smaller data frames into the bigger data frame with pandas
I have a data frame with measurements for several groups of participants, and I am doing some calculations for each group.我有一个数据框,其中包含几组参与者的测量结果,并且我正在为每组进行一些计算。 I want to add a column in a big data frame (all participants), from secondary data frames (partial list of participants).我想从辅助数据框(参与者的部分列表)中在大数据框(所有参与者)中添加一列。
When I do merge a couple of times (merging a new data frame to the existing one), it creates a duplicate of the column instead of one column.当我合并几次(将新数据框合并到现有数据框)时,它会创建列的副本而不是一列。
As the size of the dataframes is different I can not compare them directly.由于数据框的大小不同,我无法直接比较它们。
I tried我试过了
#df1 - main bigger dataframe, df2 - smaller dataset contains group of df1
for i in range(len(df1)):
# checking indeces to place the data to correct participant:
if df1.index[i] not in df2['index']:
pass
else :
df1['rate'][i] = list(df2[rate][df2['index']==i])
It does not work properly though.但它不能正常工作。 Can you please help with the correct way of assembling the column?你能帮忙看看组装柱子的正确方法吗? update: where the index of the initial dataframe and the "index" column of the calculation is the same, copy the rate value from the calculation into main df更新:初始dataframe的索引与计算的“索引”列相同,将计算中的速率值复制到主df
main dataframe 1df主 dataframe 1df
index指数 | rate速度 |
---|---|
1 1 | 0 0 |
2 2 | 0 0 |
3 3 | 0 0 |
4 4 | 0 0 |
5 5 | 0 0 |
6 6 | 0 0 |
dataframe with calculated values dataframe 与计算值
index指数 | rate速度 |
---|---|
1 1 | value价值 |
4 4 | value价值 |
6 6 | value价值 |
output df output df
index指数 | rate速度 |
---|---|
1 1 | value价值 |
2 2 | 0 0 |
3 3 | 0 0 |
4 4 | value价值 |
5 5 | 0 0 |
6 6 | value价值 |
Try this – using .join()
to merge dataframes on their indices and combining two columns using .combine_first()
:试试这个——使用.join()
合并索引上的数据帧,并使用.combine_first()
合并两列:
df = df1.join(df2, lsuffix="_df1", rsuffix="_df2")
df["rate"] = df["rate_df2"].combine_first(df["rate_df1"])
EDIT:编辑:
This assumes both dataframes use a matching index.这假设两个数据帧都使用匹配的索引。 If that is not the case for df2
, run this first:如果df2
不是这种情况,请先运行:
df2 = df2.set_index('index')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.