[英]How to parse large YAML file in Java or Kotlin?
I have a large YAML file (~5MB) and I need to parse it using Kotlin/JVM.我有一个很大的 YAML 文件(~5MB),我需要使用 Kotlin/JVM 解析它。
I tried using the streaming API of Jackson 2.14.1, but it throws:我尝试使用 Jackson 2.14.1 的流式 API,但它抛出:
com.fasterxml.jackson.dataformat.yaml.JacksonYAMLParseException: The incoming YAML document exceeds the limit: 3145728 code points.
at [Source: (ZipInputStream); line: 122415, column: 9]
...
Caused by: org.yaml.snakeyaml.error.YAMLException: The incoming YAML document exceeds the limit: 3145728 code points.
My YAML file is a large dictionary with roughly 5k keys, and a small document is associated to each key.我的 YAML 文件是一个包含大约 5k 个键的大字典,每个键都关联了一个小文档。 I stream the root keys and parse each associated document with the JsonParser.readValueAs() method.我流式传输根键并使用JsonParser.readValueAs()方法解析每个关联的文档。 Since I was streaming, I expected there would be no issue regarding the size of the dictionary, as long as each sub-document is small enough.因为我是流媒体,所以我希望字典的大小不会有问题,只要每个子文档足够小。 But well, there is.但是,有。 I checked the document that fails to parse, at line 122415, and it is neither large (it is 1.5KB) nor ill formed (according to https://www.yamllint.com/ ).我在第 122415 行检查了无法解析的文档,它既不大(1.5KB)也不格式错误(根据https://www.yamllint.com/ )。
My code is:我的代码是:
@Service
class Parser(
@Qualifier("yamlMapper") private val yamlMapper: ObjectMapper,
) {
fun parse(input: InputStream): Flow<Item> = flow {
val parser = yamlMapper.factory.createParser(input)
parser.use {
parser.requireToken(JsonToken.START_OBJECT)
var token = parser.nextToken()
while (token != JsonToken.END_OBJECT) {
if (token != JsonToken.FIELD_NAME) {
throw JsonParseException(parser, "Expected FIELD_NAME but was $token")
}
parser.requireToken(JsonToken.START_OBJECT)
emit(parser.readValueAs(Item::class.java))
token = parser.nextToken()
}
parser.requireToken(null)
}
}
}
fun JsonParser.requireToken(expected: JsonToken?) {
val actual = nextToken()
if (actual != expected) {
throw JsonParseException(this, "Expected ${expected ?: "end of file"} but was $actual")
}
}
After digging Jackson's documentation, it turns out this is quite easy.翻阅 Jackson 的文档后,发现这很容易。 I needed to configure the YAMLFactory when creating the ObjectMapper:创建 ObjectMapper 时我需要配置 YAMLFactory:
@SpringBootApplication
class Main {
@Bean
fun yamlMapper(): ObjectMapper =
ObjectMapper(YAMLFactory.builder()
.loaderOptions(LoaderOptions().apply {
codePointLimit = 100 * 1024 * 1024 // 100MB
})
)
}
See Maximum input YAML document size (3 MB) .请参阅最大输入 YAML 文档大小 (3 MB) 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.