简体   繁体   English

Java正则表达式可捕获电子邮件编码的附件字符串

[英]Java Regular Expression to catch email encoded attachments string

I would like to write a regular expression that catch Encoded Words parts from email MIME message string (eml). 我想编写一个从电子邮件MIME消息字符串(eml)捕获编码词部分的正则表达式。 for example, this is part of email: 例如,这是电子邮件的一部分:

<div dir=3D"ltr"><br clear=3D"all"><div><div dir=3D"ltr"><div style=3D"dire=
ction:rtl">-------------</div><div style=3D"direction:rtl">=D7=91=D7=91=D7=
=A8=D7=9B=D7=94,</div><div style=3D"direction:rtl">=D7=90=D7=91=D7=99=D7=A2=
=D7=93 =D7=9B=D7=94=D7=9F</div></div></div>
</div>

--20cf3003bc2e044e980500f755dc--
--20cf3003bc2e044e9d0500f755de
Content-Type: text/plain; charset=US-ASCII; name="EhudBanay.txt"
Content-Disposition: attachment; filename="EhudBanay.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hz0z4us30

aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu
--20cf3003bc2e044e9d0500f755de--

i would like to catch only this part: 我只想抓住这部分:

aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu

I wrote the folowing regular expression: (?<=\\r\\n\\r\\n)(.+\\r\\n)+(?=--) which provide the results i want. 我写了下面的正则表达式:(?<= \\ r \\ n \\ r \\ n)(。+ \\ r \\ n)+(?=-)提供了我想要的结果。 i assume that: * the regular expression should return all the occurrences of such parts. 我假设:*正则表达式应返回所有此类部分的出现。 multiple * before the wanted part there is always an empty line. 多个*所需部分之前始终有一个空行。 * the wanted part is always followed by a line start with "--" (there might be a seperation line between them). *所需的部分始终以“-”开头的行(它们之间可能有分隔线)。

The problem with this solution is that i get Exception in thread "main" java.lang.StackOverflowError when the input string is long, what makes me think of the regular expression efficiency. 这种解决方案的问题是,当输入字符串很长时,我在线程“ main” java.lang.StackOverflowError中获得了异常,这让我想到了正则表达式的效率。 I would like to make the regular expression more precise. 我想使正则表达式更加精确。 we can make use of the fact that somewhere before the empty line that comes before the wanted part, there is always a line starting with "Content-Transfer-Encoding:". 我们可以利用这样一个事实,即在所需部分之前的空行之前的某处,总会有一行以“ Content-Transfer-Encoding:”开头。 Can someone please help? 有人可以帮忙吗?

您可以尝试增加堆栈大小:-Xss2048k

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM