简体   繁体   English

如何在 Palantir Foundry 中使用 Function 对多个属性进行分组?

[英]How can I groupby multiple properties using a Function in Palantir Foundry?

I'd like to aggregate across a few properties using a Function. For example, I have a Function where a start and end date is input, and also a Schedule Object Type with "date", "shift_type", "department", and "hours worked" properties.我想使用 Function 聚合几个属性。例如,我有一个 Function,其中输入了开始和结束日期,还有一个带有“日期”、“shift_type”、“部门”和“工作时间”属性。

I'd like my output to be the sum of hours worked for each date/shift type/department combo.我希望我的 output 是每个日期/轮班类型/部门组合的工作小时数总和。

Creating a data structure for the aggregation为聚合创建数据结构

In the current Functions aggregations API, you can only create 2D and 3D aggregations directly from an ObjectSet via the groupBy and segmentBy functions.在当前的函数聚合 API 中,您只能通过groupBysegmentBy函数直接从 ObjectSet 创建 2D 和 3D 聚合。

If you want to aggregate on more than two properties (which would be a 4+D aggregation), you have two options:如果你想聚合两个以上的属性(这将是一个 4+D 聚合),你有两个选择:

  1. Convert the ObjectSet to a list of Objects (via calling .allAsync() ), and then write TypeScript logic to convert that list into a data structure that aggregates over the object properties.将 ObjectSet 转换为对象列表(通过调用.allAsync() ),然后编写 TypeScript 逻辑将该列表转换为聚合 object 属性的数据结构。 Note that this may not perform well if you have a large number (thousands or more) of Objects in your Object Set.请注意,如果您的 Object 集合中有大量(数千个或更多)对象,这可能不会很好地执行。

  2. Add a column to the Object (and the backing dataset) which is a composite key of the columns you want to group on.向 Object(和支持数据集)添加一列,这是您要分组的列的复合键。 In your example, this could look like date.2022-01-01.shift.1200.department.emergency_room .在您的示例中,这可能看起来像date.2022-01-01.shift.1200.department.emergency_room Then, in your Functions code you could do a groupBy on this composite key.然后,在您的 Functions 代码中,您可以在此复合键上执行groupBy Next, you could convert this 2D aggregation into a multi-dimensional aggregation where you split the composite key into its individual parts.接下来,您可以将此 2D 聚合转换为多维聚合,在多维聚合中将复合键拆分为各个部分。

Displaying this aggregated data in frontend applications在前端应用程序中显示此聚合数据

Depending on where you want to use this aggregated data, there may be some additional steps required.根据您要使用此聚合数据的位置,可能需要执行一些额外的步骤。 Here are some examples:这里有些例子:

If you have a Slate or custom application which calls the Function directly and handles the response on the frontend, then you could just return the aggregation as long as it conforms to the allowed Functions return types.如果您有直接调用 Function 并在前端处理响应的 Slate 或自定义应用程序,那么只要它符合允许的函数返回类型,您就可以返回聚合。

If you want to display this data in a table in Workshop (effectively as a Function-backed pivot table), then you will want to use an Object Table with Function-backed columns.如果您想在 Workshop 的表格中显示此数据(有效地作为函数支持的 pivot 表),那么您将需要使用带有函数支持列的 Object 表。 You will need an Object which is at the desired level of granularity (where the primary key is the composite key from above, for example).您将需要一个 Object,它处于所需的粒度级别(例如,主键是上面的复合键)。 This could be a very simple Object where the only property is this key (and maybe the components of the key, if that is useful for filtering purposes).这可能是一个非常简单的 Object,其中唯一的属性是这个键(如果这对过滤目的有用的话,也可能是键的组成部分)。

I don't think you can natively in functions, only if you materialize the data into your Functions driver and code the logic manually.我不认为你可以原生地使用函数,只有当你将数据具体化到你的函数驱动程序中并手动编写逻辑代码时。 However you could create a column at dataset level that you then index into ontology and query that.但是,您可以在数据集级别创建一个列,然后将其索引到本体中并对其进行查询。

In your pipeline (pyspark example)在您的管道中(pyspark 示例)

df = df.withColumn("shift_id", F.concat_ws("-", "date", "shift_type", "department"))

then in your functions you can aggregate on the shift id:然后在您的函数中,您可以聚合班次 ID:

Objects.search()
    .employees()
    .groupBy(e => e.shiftId.topValues())
    .segmentBy(e => e.hoursWorked.topValues())
    .sum()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PALANTIR-FOUNDRY:如何在转换中添加 dataframe 的描述? - PALANTIR-FOUNDRY: How can I add a description for a dataframe in a transform? 如何在 Palantir Foundry 的 Code Workbook 中使用 sparkcontext 创建一个空数据集? - How can I create an empty dataset using sparkcontext in Code Workbook in Palantir Foundry? 如何使用 Palantir Foundry 在 Pyspark 中编写 case 语句 - How do I write case statements in Pyspark using Palantir Foundry 如何在 Palantir Foundry 中合并多个动态输入? - How to union multiple dynamic inputs in Palantir Foundry? Palantir Foundry:如何指定在 Foundry UI 中将项目创建到哪个命名空间? - Palantir Foundry: How can I specify which namespace a project gets created to in the Foundry UI? 如何在 Pyspark 和 Palantir Foundry 中使用多个语句将列的值设置为 0 - How do I set value to 0 of column with multiple statements in Pyspark and Palantir Foundry 如何更新 Palantir Foundry Ontology 编辑函数中的数组属性? - How do I update an array property in a Palantir Foundry Ontology edit Function? 如何将 R ggplot 图表写入 palantir Foundry 文件系统? - How can I write R ggplot charts into palantir foundry file system? 如何在 Palantir Foundry 的 Python 转换中传递数据集元数据,如 hash 或时间戳? - How can I pass through dataset metadata, like a hash or timestamp, in a Python Transform in Palantir Foundry? 在 Palantir Foundry 中,我怎样才能只在我的 Python Transform 的某些分支上运行测试? - In Palantir Foundry, how can I only run tests on some branches of my Python Transform?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM