简体   繁体   English

用来自另一个数据帧的值填充一列

[英]Filling a column with values from another dataframe

I want to fill the column of the df2 (~100.000 rows) with the values from the same column of df (~1.000.000 rows).我想用来自 df 同一列(~1.000.000 行)的值填充 df2(~100.000 行)的列。 Df often has several times the same row but with wrong data, so I always want to take the first value of my column 'C'. Df 经常有几次相同的行但数据错误,所以我总是想取我的列“C”的第一个值。

df = pd.DataFrame([[100, 1, 2], [100, 3, 4], [100, 5, 6], [101, 7, 8], [101, 9, 10]],
                  columns=['A', 'B', 'C'])

df2=pd.DataFrame([[100,0],[101,0]], columns=['A', 'C'])

for i in range(0,len(df2.index)):
    #My Question:
    df2[i,'C']=first value of 'C' column of df where the 'A' column is the same of both dataframes. E.g. the first value for 100 would be 2 and then the first value for 101 would be 8

In the end, my output should be a table like this:最后,我的输出应该是这样的表:

df2=pd.DataFrame([[100,2],[101,8]], columns=['A', 'C'])

You can try this:你可以试试这个:

df2['C'] = df.groupby('A')['C'].first().values

Which will give you:这会给你:

    A   C
0   100 2
1   101 8

first() returns the first value of every group. first()返回每个组的第一个值。
Then you want to assign the values to df2 column, unfortunately, you cannot assign the result directly like this:然后您想将值分配给 df2 列,不幸的是,您不能像这样直接分配结果:
df2['C'] = df.groupby('A')['C'].first() . df2['C'] = df.groupby('A')['C'].first() Because the above line will result in :因为上面的行将导致:

    A   C
0   100 NaN
1   101 NaN

(You can read about the cause here: Adding new column to pandas DataFrame results in NaN ) (您可以在此处阅读原因: 向 Pandas DataFrame 添加新列导致 NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM