简体   繁体   English

PHP:尽可能大块地将某些字符分成一个大字符串

[英]PHP: Splitting a large string by certain characters in as large chunks as possible

I am implementing Google Translation API and it will only take up to 5000 characters at a time, so I need to split larger documents into smaller ones and send multiple API requests. 我正在实现Google Translation API,一次只能使用5000个字符,因此我需要将较大的文档拆分为较小的文档,然后发送多个API请求。

I need to therefore split my content into chunks that are as long as possible (but less than 5000) and that has been split, hopefully not in the middle of a sentence which would make the translations difficult to process for Google. 因此,我需要将我的内容拆分为尽可能长的片段(但少于5000个),并且已经拆分,希望不在句子中间,这会使Google难以处理翻译。

I would therefore like to give my method an array of characters it should look for when splitting. 因此,我想给我的方法分配一个在拆分时应该寻找的字符数组。

  • </div>
  • </p>
  • </section>
  • </blockquote>
  • </br>
  • . (dot space) (点距)

What would be a good approach to this? 有什么好的方法呢?

Regexp is greedy by default. 正则表达式默认是贪婪的。

.{0,4980}(\<\/div\>|\<\/p\>|\<\/section\>|\<\/blockquote\>|\<\/br\>|\.\s)

Should give the longest string ending with one of your delimiters. 应该给出以您的分隔符之一结尾的最长字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM