简体   繁体   English

猪代数与弦

[英]Pig algebraic with strings

I need to write an interface that would collapse individual items in a bag to a string. 我需要编写一个接口,将袋子中的单个项目折叠成字符串。 Example: {(a),(b)} into ab. 例如:{(a),(b)}变成ab。

First, can algebraic interface be used to return strings or is it restricted to integers (Long). 首先,可以使用代数接口返回字符串,还是将其限制为整数(长整数)。

Second, is there a place where I can access some examples of using algebraic (apart from the COUNT example I see everywhere). 其次,在哪里可以访问一些使用代数的示例(除了我在各处看到的COUNT个示例)。

Just answer if Algebraic can be used to process strings and let me know if there is a good place where I can see some existing UDF code (not the ones that exactly solve my problem). 只需回答是否可以使用代数来处理字符串,并让我知道是否有个不错的地方可以看到一些现有的UDF代码(而不是可以完全解决我的问题的代码)。

These are the things I have tried: 这些是我尝试过的事情:

  1. Googling for any UDF code that works on bags. 搜寻适用于包装袋的所有 UDF代码。 Not getting anything apart from the COUNT example that is posted everywhere 除了发布到各处的COUNT个示例之外,没有其他任何东西
  2. Trying out different options in Pig. 在Pig中尝试其他选项。 Apparently you cannot de-reference individual items inside a bag which is a bummer. 显然,您不能取消对不耐烦的袋子内的单个项目的引用。

Finally this is what I figured out: 最后,这就是我想出的:

  1. If your problem can be solved by JOIN as efficiently as GROUP, you should do it. 如果JOIN可以和GROUP一样有效地解决问题,则应该这样做。 GROUPs create bags which are harder to deal with. 小组会创建难以处理的袋子。

  2. You are not obligated to use ALGEBRAIC to deal with bags. 您没有义务使用ALGEBRAIC处理行李。 Instead you can just write a EVAL UDF. 相反,您可以只编写一个EVAL UDF。 However it is going to be much slower if your bag size is large. 但是,如果您的行李袋很大,则速度会慢很多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM