[英]Filling a column with values from another dataframe
I want to fill the column of the df2 (~100.000 rows) with the values from the same column of df (~1.000.000 rows).我想用来自 df 同一列(~1.000.000 行)的值填充 df2(~100.000 行)的列。 Df often has several times the same row but with wrong data, so I always want to take the first value of my column 'C'.
Df 经常有几次相同的行但数据错误,所以我总是想取我的列“C”的第一个值。
df = pd.DataFrame([[100, 1, 2], [100, 3, 4], [100, 5, 6], [101, 7, 8], [101, 9, 10]],
columns=['A', 'B', 'C'])
df2=pd.DataFrame([[100,0],[101,0]], columns=['A', 'C'])
for i in range(0,len(df2.index)):
#My Question:
df2[i,'C']=first value of 'C' column of df where the 'A' column is the same of both dataframes. E.g. the first value for 100 would be 2 and then the first value for 101 would be 8
In the end, my output should be a table like this:最后,我的输出应该是这样的表:
df2=pd.DataFrame([[100,2],[101,8]], columns=['A', 'C'])
You can try this:你可以试试这个:
df2['C'] = df.groupby('A')['C'].first().values
Which will give you:这会给你:
A C
0 100 2
1 101 8
first()
returns the first value of every group. first()
返回每个组的第一个值。
Then you want to assign the values to df2 column, unfortunately, you cannot assign the result directly like this:然后您想将值分配给 df2 列,不幸的是,您不能像这样直接分配结果:
df2['C'] = df.groupby('A')['C'].first()
. df2['C'] = df.groupby('A')['C'].first()
。 Because the above line will result in :因为上面的行将导致:
A C
0 100 NaN
1 101 NaN
(You can read about the cause here: Adding new column to pandas DataFrame results in NaN ) (您可以在此处阅读原因: 向 Pandas DataFrame 添加新列导致 NaN )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.