用来自另一个数据帧的值填充一列

Question

I want to fill the column of the df2 (~100.000 rows) with the values from the same column of df (~1.000.000 rows).我想用来自 df 同一列（~1.000.000 行）的值填充 df2（~100.000 行）的列。 Df often has several times the same row but with wrong data, so I always want to take the first value of my column 'C'. Df 经常有几次相同的行但数据错误，所以我总是想取我的列“C”的第一个值。

df = pd.DataFrame([[100, 1, 2], [100, 3, 4], [100, 5, 6], [101, 7, 8], [101, 9, 10]],
                  columns=['A', 'B', 'C'])

df2=pd.DataFrame([[100,0],[101,0]], columns=['A', 'C'])

for i in range(0,len(df2.index)):
    #My Question:
    df2[i,'C']=first value of 'C' column of df where the 'A' column is the same of both dataframes. E.g. the first value for 100 would be 2 and then the first value for 101 would be 8

In the end, my output should be a table like this:最后，我的输出应该是这样的表：

df2=pd.DataFrame([[100,2],[101,8]], columns=['A', 'C'])

Answer 1

You can try this:你可以试试这个：

df2['C'] = df.groupby('A')['C'].first().values

Which will give you:这会给你：

    A   C
0   100 2
1   101 8

first() returns the first value of every group. first()返回每个组的第一个值。
Then you want to assign the values to df2 column, unfortunately, you cannot assign the result directly like this:然后您想将值分配给 df2 列，不幸的是，您不能像这样直接分配结果：
df2['C'] = df.groupby('A')['C'].first() . df2['C'] = df.groupby('A')['C'].first() 。 Because the above line will result in :因为上面的行将导致：

    A   C
0   100 NaN
1   101 NaN

(You can read about the cause here: Adding new column to pandas DataFrame results in NaN ) （您可以在此处阅读原因：向 Pandas DataFrame 添加新列导致 NaN ）

用来自另一个数据帧的值填充一列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-22 16:06:52

用来自另一个数据帧的值填充一列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-22 16:06:52

解决方案1
1 已采纳 2020-03-22 16:06:52