简体   繁体   English

在现有 pandas df 中添加缺少观测值的变量,而不会丢失较大 df 中的观测值

[英]Adding a variable with missing observations to an existing pandas df without loosing observations in the larger df

I have two dataframes.我有两个数据框。 One called ENFORE with 139 observations:一个叫 ENFORE,有 139 个观察结果:

citation    Enfore
0170/0952   1
0175/0686   1
0184/0521   1
0183/0726   1
0178/0595   0

And another data frame called CITATIONS with 668 observations that also have the column citation, but not Enfore.另一个名为 CITATIONS 的数据框有 668 个观察值,也有列引用,但没有 Enfore。 All the citations in ENFORE are in the CITATIONS data frame. ENFORE 中的所有引文都在 CITATIONS 数据框中。

I would like to add the column Enfore to the CITATIONS data frame and fill in observations that are not in the ENFORE data frame with an 'X'.我想将 Enfore 列添加到 CITATIONS 数据框中,并用“X”填写不在 ENFORE 数据框中的观察结果。

Using various variations of this code (merge and join)使用此代码的各种变体(合并和连接)

enfore_merged = pd.merge(enfore , harrington_citations, on = 'citation')

I have not been able to create the data frame I describe above.我无法创建上面描述的数据框。

You are almost there:你快到了:

enfore_merged = harrington_citations.merge(enfore, how='left', on='citation')
enfore_merged['Enfore'] = enfore_merged['Enfore'].fillna('X')

Use Series.map :使用Series.map

harrington_citations['Enfore']=harrington_citations['citation'].map(enfore.set_index('citation')['Enfore']).fillna('X')

This creates a new Enfore column in its CITATIONS data frame that I understand is harrington_citations by mapping the citation column of this data frame with a series obtained from the ENFORE data frame whose index is citation and its values are those of the Enfore column这将在其CITATIONS数据框中创建一个Enfore,我理解为harrington_citations ,方法是将此数据框的citation列与从索引为citation且其值为Enfore列的ENFORE数据框获得的系列映射

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM