简体   繁体   English

在填充序列中为 max_len 设置什么值?

[英]What value to set for max_len in pad sequences?

Does the value of max_len in pad sequences for deep learning depend upon the use case?用于深度学习的填充序列中的 max_len 值是否取决于用例? Suppose if it was a Twitter related classification, should the value be set to 280 (280 is the maximum length of characters in tweets)?假设如果是 Twitter 相关分类,该值是否应该设置为 280(280 是推文中字符的最大长度)?

Absolutely not, After you converted texts into sequences by tokenizer which had been fitted on list of tweets, you could iterate over these sequences to derive the length of seqeunces.绝对不是,在您通过标记器将文本转换为已安装在推文列表中的序列后,您可以遍历这些序列以得出序列的长度。

the max_len parameter in pad_sqeuences function refer to the maximum length of the sequence, so it won't mean the length of a tweet based on its characters, but also it means the length of sequence. pad_sqeuences function 中的 max_len 参数是指序列的最大长度,因此它不是根据其字符表示推文的长度,而是表示序列的长度。

and after that, you don't need to set it the maximum length of the tweets sequences, even you could set it lower than that.之后,您无需将其设置为推文序列的最大长度,即使您可以将其设置为低于该长度。 but notice by this approach, it would be better to remove stopwords and filter characters before you fit tokenizer on the list of tweets.但是请注意,通过这种方法,最好在将标记器放入推文列表之前删除停用词和过滤字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM