简体   繁体   English

相当于pandas.merge的Deedle

[英]Deedle Equivalent to pandas.merge

I am looking to merge two Deedle (F#) frames based on a specific column in each frame in a similar manner as pandas.DataFrame.Merge .The perfect example of this would be a primary frame that contains columns of data and a (city, state) column along with an information frame that contains the following columns: (city, state); 我正在以类似于pandas.DataFrame.Merge的方式在每个帧中基于特定列合并两个Deedle(F#)帧。这的完美示例是一个包含数据列和一个(city,状态)列以及包含以下列的信息框:(城市,州); lat; 土地增值税; long. 长。 If I want to add the lat long columns into my primary frame, I would merge the two frames on the (city, state) column. 如果我想将纬度较长的列添加到主框架中,则可以合并(city,state)列中的两个框架。

Here is an example: 这是一个例子:

    let primaryFrame =
            [(0, "Job Name", box "Job 1")
             (0, "City, State", box "Reno, NV")
             (1, "Job Name", box "Job 2")
             (1, "City, State", box "Portland, OR")
             (2, "Job Name", box "Job 3")
             (2, "City, State", box "Portland, OR")
             (3, "Job Name", box "Job 4")
             (3, "City, State", box "Sacramento, CA")] |> Frame.ofValues

    let infoFrame =
            [(0, "City, State", box "Reno, NV")
             (0, "Lat", box "Reno_NV_Lat")
             (0, "Long", box "Reno_NV_Long")
             (1, "City, State", box "Portland, OR")
             (1, "Lat", box "Portland_OR_Lat")
             (1, "Long", box "Portland_OR_Long")] |> Frame.ofValues

    // see code for merge_on below.
    let mergedFrame = primaryFrame
                      |> merge_On infoFrame "City, State" null

Which would result in 'mergedFrame' looking like this: 这将导致“ mergedFrame”如下所示:

> mergedFrame.Format();;
val it : string =
  "     Job Name City, State    Lat             Long             
0 -> Job 1    Reno, NV       Reno_NV_Lat     Reno_NV_Long     
1 -> Job 2    Portland, OR   Portland_OR_Lat Portland_OR_Long 
2 -> Job 3    Portland, OR   Portland_OR_Lat Portland_OR_Long 
3 -> Job 4    Sacramento, CA <missing>       <missing>   

I have come up with a way of doing this (the 'merge_on' function used in the example above), but being a Sales Engineer who is new to F#, I imagine there is a more idiomatic/efficient way of doing this. 我想出了一种方法(上面示例中使用的“ merge_on”函数),但是作为一名F#新手的销售工程师,我想有一种更惯用/有效的方法。 Below is my functions for doing this along with a 'removeDuplicateRows' which does what you would expect and was needed for the 'merge_on' function; 下面是我执行此操作的函数以及一个'removeDuplicateRows',它执行了您期望的功能,并且是'merge_on'函数所需的; if you want to comment on a better way of doing this as well, please do. 如果您也想评论一种更好的方法,请这样做。

    let removeDuplicateRows column (frame : Frame<'a, 'b>) =
             let nonDupKeys = frame.GroupRowsBy(column).RowKeys
                              |> Seq.distinctBy (fun (a, b) -> a) 
                              |> Seq.map (fun (a, b) -> b)  
             frame.Rows.[nonDupKeys]


    let merge_On (infoFrame : Frame<'c, 'b>) mergeOnCol missingReplacement 
                  (primaryFrame : Frame<'a,'b>) =
          let frame = primaryFrame.Clone() 
          let infoFrame =  infoFrame                           
                           |> removeDuplicateRows mergeOnCol 
                           |> Frame.indexRows mergeOnCol
          let initialSeries = frame.GetColumn(mergeOnCol)
          let infoFrameRows = infoFrame.RowKeys
          for colKey in infoFrame.ColumnKeys do
              let newSeries =
                  [for v in initialSeries.ValuesAll do
                        if Seq.contains v infoFrameRows then  
                            let key = infoFrame.GetRow(v)
                            yield key.[colKey]
                        else
                            yield box missingReplacement ]
              frame.AddColumn(colKey, newSeries)
          frame

Thanks for your help! 谢谢你的帮助!

UPDATE: 更新:

Switched Frame.indexRowsString to Frame.indexRows to handle cases where the types in the 'mergOnCol' are not strings. 将Frame.indexRowsString切换为Frame.indexRows以处理'mergOnCol'中的类型不是字符串的情况。

Got rid of infoFrame.Clone() as suggested by Tomas 删除了Tomas建议的infoFrame.Clone()

The way Deedle does joining of frames (only in row/column keys) sadly means that it does not have a nice built-in function to do joining of frames over a non-key column. 可悲的是,Deedle进行框架连接(仅在行/列键中)的方式令人遗憾地意味着,它没有很好的内置功能来在非关键列上进行框架的连接。

As far as I can see, your approach looks very good to me. 据我所知,您的方法对我来说非常好。 You do not need Clone on the infoFrame (because you are not mutating the frame) and I think you can replace infoFrame.GetRow with infoFrame.TryGetRow (and then you won't need to get the keys in advance), but other than that, your code looks fine! 您不需要在infoFrameClone (因为您不infoFrame框架),并且我认为您可以将infoFrame.GetRow替换为infoFrame.TryGetRow (这样就不需要提前获取密钥了),但infoFrame.TryGetRow ,您的代码看起来不错!

I came up with an alternative and a bit shorter way of doing this, which looks as follows: 我想出了一种替代方法,并且方法更短一些,如下所示:

// Index the info frame by city/state, so that we can do lookup
let infoByCity = infoFrame |> Frame.indexRowsString "City, State"

// Create a new frame with the same row indices as 'primaryFrame' 
// containing the additional information from infoFrame.
let infoMatched = 
  primaryFrame.Rows
  |> Series.map (fun k row -> 
      // For every row, we get the "City, State" value of the row and then
      // find the corresponding row with additional information in infoFrame. Using 
      // 'ValueOrDefault' will automatically give missing when the key does not exist
      infoByCity.Rows.TryGet(row.GetAs<string>("City, State")).ValueOrDefault)
  // Now turn the series of rows into a frame
  |> Frame.ofRows

// Now we have two frames with matching keys, so we can join!
primaryFrame.Join(infoMatched)

This is a bit shorter and maybe more self-explanatory, but I have not done any tests to check which is faster. 这有点短,也许更不言自明,但是我没有做任何测试来检查哪个更快。 Unless performance is a primary concern, I think going with the more readable version is a good default choice though! 除非性能是首要考虑的问题,否则我认为采用可读性更好的版本是一个不错的默认选择!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM