I'm working on a data processing pipeline where we read a lot of files from cloud storage. The files might be csv files with a header row, which I need to remove so I don't get errors down the line.
If possible I would love to use:
TextIO.Read.from(filePattern)
together with something else since it automatically handles compression and such. Ideally it should look something like this:
TextIO.Read.from(filePattern, numberOfHeaderRows)
and that should just exclude numberOfHeaderRows
from the top. What is the easiest way to achieve something like this in java?
最简单的路径可能使用TextIO.Read.from(filePattern)
然后使用ParDo
过滤掉标题行。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.