[英]Adding a variable with missing observations to an existing pandas df without loosing observations in the larger df
I have two dataframes.我有两个数据框。 One called ENFORE with 139 observations:
一个叫 ENFORE,有 139 个观察结果:
citation Enfore
0170/0952 1
0175/0686 1
0184/0521 1
0183/0726 1
0178/0595 0
And another data frame called CITATIONS with 668 observations that also have the column citation, but not Enfore.另一个名为 CITATIONS 的数据框有 668 个观察值,也有列引用,但没有 Enfore。 All the citations in ENFORE are in the CITATIONS data frame.
ENFORE 中的所有引文都在 CITATIONS 数据框中。
I would like to add the column Enfore to the CITATIONS data frame and fill in observations that are not in the ENFORE data frame with an 'X'.我想将 Enfore 列添加到 CITATIONS 数据框中,并用“X”填写不在 ENFORE 数据框中的观察结果。
Using various variations of this code (merge and join)使用此代码的各种变体(合并和连接)
enfore_merged = pd.merge(enfore , harrington_citations, on = 'citation')
I have not been able to create the data frame I describe above.我无法创建上面描述的数据框。
You are almost there:你快到了:
enfore_merged = harrington_citations.merge(enfore, how='left', on='citation')
enfore_merged['Enfore'] = enfore_merged['Enfore'].fillna('X')
Use Series.map
:使用
Series.map
:
harrington_citations['Enfore']=harrington_citations['citation'].map(enfore.set_index('citation')['Enfore']).fillna('X')
This creates a new Enfore
column in its CITATIONS
data frame that I understand is harrington_citations
by mapping the citation
column of this data frame with a series obtained from the ENFORE
data frame whose index is citation
and its values are those of the Enfore
column这将在其
CITATIONS
数据框中创建一个新的Enfore
列,我理解为harrington_citations
,方法是将此数据框的citation
列与从索引为citation
且其值为Enfore
列的ENFORE
数据框获得的系列映射
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.