
[英]How to convert PCollection<TableRow> to PCollection<Row> in Apache Beam?
[英]Apache Beam Select Top N rows from PCollection in Go
我有一个 PCollection,我需要从中选择 n 个最大的行。 我正在尝试使用 Go 创建一个数据流管道并坚持这一点。
package main
import (
"context"
"flag"
"fmt"
"github.com/apache/beam/sdks/v2/go/pkg/beam"
"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
"github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
)
type User struct {
Name string
Age int
}
func printRow(ctx context.Context, list User) {
fmt.Println(list)
}
func main() {
flag.Parse()
beam.Init()
ctx := context.Background()
p := beam.NewPipeline()
s := p.Root()
var userList = []User{
{"Bob", 5},
{"Adam", 8},
{"John", 3},
{"Ben", 1},
{"Jose", 1},
{"Bryan", 1},
{"Kim", 1},
{"Tim", 1},
}
initial := beam.CreateList(s, userList)
pc2 := beam.ParDo(s, func(row User, emit func(User)) {
emit(row)
}, initial)
beam.ParDo0(s, printRow, pc2)
if err := beamx.Run(ctx, p); err != nil {
log.Exitf(ctx, "Failed to execute job: %v", err)
}
}
从上面的代码中,我需要根据 User.Age 选择前 5 行,我发现链接顶部 package具有 function,但它说它返回单个元素 PCollection。 它有何不同?
package main
import (
"context"
"flag"
"fmt"
"github.com/apache/beam/sdks/v2/go/pkg/beam"
"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
"github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/top"
"github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
)
func init() {
beam.RegisterFunction(less)
}
type User struct {
Name string
Age int
}
func printRow(ctx context.Context, list User) {
fmt.Println(list)
}
func less(a, b User) bool {
return a.Age < b.Age
}
func main() {
flag.Parse()
beam.Init()
ctx := context.Background()
p := beam.NewPipeline()
s := p.Root()
var userList = []User{
{"Bob", 5},
{"Adam", 8},
{"John", 3},
{"Ben", 1},
{"Jose", 1},
{"Bryan", 1},
{"Kim", 1},
{"Tim", 1},
}
initial := beam.CreateList(s, userList)
best := top.Largest(s, initial, 5, less)
pc2 := beam.ParDo(s, func(row User, emit func(User)) {
emit(row)
}, best)
beam.ParDo0(s, printRow, pc2)
if err := beamx.Run(ctx, p); err != nil {
log.Exitf(ctx, "Failed to execute job: %v", err)
}
}
我像上面一样将 function 添加到 select 的前 5 行,但出现错误[]main.User is not assignable to main.User
我需要与以前相同格式的 PCollection,因为我有进一步的处理要做。 我怀疑这是因为 top.Largest function 正在返回单元素 PCollection。 关于如何转换格式的任何想法?
最好的 PCollection 是 []User
所以试试...
pc2 := beam.ParDo(s, func(rows []User, emit func(User)) {
for _, row := range rows {
emit(row)
}
}, best)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.