简体   繁体   English

Pandas Dataframe-在一个数据框中不存在一列时合并数据

[英]Pandas Dataframe - merge data when a column does not exist in one dataframe

There are two sets of csv files. 有两组csv文件。 File type1 has following columns: 文件类型1包含以下列:

col1, node, id, col4...... col100, dest.
ABC, 1, 1000, XY, ..... ax, LA
XYZ, 3, 3000, TY, ......ty, NY
WAR, 2, 2000, MJ, ......rr, London

File type2 has following columns: 文件类型2包含以下列:

col101, node-name, col102, col103..... col200, dest
ark, 16, ty, tuu, ...., bfg, Mumbai
raid,25, by, why, ...., cgh, Nairobi

My requirement is the following: Create a file that contains id, node/node-name and dest. 我的要求如下:创建一个包含id,node / node-name和dest的文件。 id is not available in the second file and so must be noted as 0 corresponding to the node-name entries. id在第二个文件中不可用,因此必须将其标记为0(与节点名称条目相对应)。 So the data would look like 所以数据看起来像

1000, 1, LA
2000, 2, London
3000, 3, NY
0, 16, Mumbai
0, 25, Nairobi

This is the code that I am writing. 这是我正在编写的代码。

# frames contains all the files and their data as filename and as a data frame.
for fileName, frame in frames.items():
  nodeinfo = frame.columns.values.tolist()[1]
  if nodeinfo == 'node-name':
    entry = 'node-name'
  else:
    entry = 'node'
  if entry in frame:
    frame1= frame[[entry, 'dest']]
    if 'id' in frame:
      idFrame = frame[['id', entry]]
    mergeFrame = pandas.merge(idFrame, frame1, how = 'right', on = entry)
    uniqFrame = mergeFrame.drop_duplicates([entry])

Of course there is a problem with the logic that mergeFrame may throw an error if the idFrame does not exist because there is no 'id' in the file. 当然,如果idFrame不存在(因为文件中没有'id'),则mergeFrame可能会引发错误的逻辑存在问题。 Honestly, I am a bit lost. 老实说,我有点迷茫。 Any help is appreciated. 任何帮助表示赞赏。

Thank you, Anoop 谢谢你,阿努普

You can add a dummy column in file2 DF like below: 您可以在file2 DF添加一个虚拟列,如下所示:

frame2['id'] = 0

This will add a new column id in frame2 with values 0 for all rows. 这将在第2帧中为所有行添加一个值为0的新列id Now you have consistent columns in both frames. 现在,您在两个框架中都有一致的列。

Rename node-name column to node in Frame2. 重命名node-name列为Frame2中的node

frame2.rename(columns={'node-name':'node'}, inplace=True)

Then, you can use concat : 然后,您可以使用concat

pd.concat([frame1[['id','node','dest']], frame2[['id','node','dest']]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM