简体   繁体   English

sed,awk,regex修改化学术语

[英]sed, awk, regex to modify chemical terms

My platform: Windows 7, 64 bit; 我的平台:Windows 7、64位; 8 GB memory; 8 GB内存; GNUwin32; 的GnuWin32; sed 4.1.5.4013; sed 4.1.5.4013; awk 3.1.6.2962. awk 3.1.6.2962。

My problem: long chemical terms. 我的问题是化学作用长。

Example_1_: 4-((((2-chloroethyl)nitrosoamino)carbonyl)methylamino)cyclohexanecarboxylic acid.

Example_2_: 3'-O-(3-(N-(4-azido-2-nitrophenyl)amino)propionyl)adenosine-5'-triphosphate.

Example_3_: 2-((2-chloroethyl)methylamino)ethyl-4-ethoxybenzoate.

Want to introduce <wbr> to give browser opportunity to break a long chemical term. 想要引入<wbr>给浏览器一个打破长期化学术语的机会。

Want to break after right paren. 想要在右后方休息一下。

However, only want to introduce <wbr> if the chemical term has 3 or more right parens. 但是,仅在化学术语具有3个或更多右括号的情况下才想引入<wbr>

Further, if chemical term has three or more right parens, only want to introduce <wbr> for the last two right parens. 此外,如果化学术语具有三个或更多个右括号,则只想为最后两个右括号引入<wbr> Reason: do not want to wrap a term to more than three lines. 原因:不想将一个术语包装成三行以上。

Example_1_ would look like this: 4-((((2-chloroethyl)nitrosoamino)carbonyl)`<wbr>`methylamino)`<wbr>`cyclohexanecarboxylic acid.

Example_2_ would look like this: 3'-O-(3-(N-(4-azido-2-nitrophenyl)amino)`<wbr>`propionyl)`<wbr>`adenosine-5'-triphosphate

Example_3_ would not be modified because it does not have 3 or more right parens.

How to use sed , awk , regex to implement the above? 如何使用sedawkregex实现以上内容?

Thanks in advance for advice. 在此先感谢您的建议。

Thanks for a clear explanation. 感谢您的明确解释。 This seems to work. 这似乎有效。 I don't have exactly your version of sed. 我没有您所使用的sed版本。

sed 's/)\([^)]*)\)\([^)]*)\)\([^)]*\)$/)\1`<wbr>`\2`<wbr>`\3/' data.txt

You did not say which shell you're using. 您没有说您正在使用哪个shell。 This is for bash and similar. 这是bash和类似的。 For Windows CMD, try double quotes. 对于Windows CMD,请尝试使用双引号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM