简体   繁体   English

如何在python中删除RTL字符串的最后一个字符?

[英]How do I remove the last character of an R-T-L string in python?

I am trying to remove the last character of a string in a "right-to-left" language. 我试图以“从右到左”的语言删除字符串的最后一个字符。 When I do, however, the last character wraps to the beginning of the string. 但是,当我这样做时,最后一个字符包裹到字符串的开头。 eg ותֵיהֶם]׃ becomes ותֵיהֶם] 例如ותֵיהֶם]׃成为ותֵיהֶם]

I know that this is a fundamental issue with how I'm handling the RTL paradigm, but if someone could help me think through it, I'd very much appreciate it. 我知道这是我处理RTL范式的一个基本问题,但如果有人能帮助我思考它,我会非常感激。

CODE

with open(r"file.txt","r") as f:
    for line in f:
        line = unicode(line,'utf-8')
        the_text = line.split('\t')[1]
        the_text.replace(u'\u05C3','')

Some characters in Unicode are always LTR, some are always RTL, and some can be either depending on their surrounding context. Unicode中的某些字符始终是LTR,有些字符始终是RTL,有些字符可能取决于其周围的上下文。 In addition, the display context for bidirectional text will have a "predominant" directionality (eg a text editor configured for mainly-English text would be predominantly LTR and have a ragged right margin, one configured for mainly-Hebrew would be predominantly RTL with a ragged left margin). 此外,双向文本的显示上下文将具有“主导”方向性(例如,主要为英文文本配置的文本编辑器将主要是LTR并且具有参差不齐的右边距,一个主要配置为希伯来文将主要是RTL与衣衫褴褛的左边缘)。

It looks like what has happened here is that when a closing square bracket character appears between two RTL characters it is rendered in its RTL form (your first example) but when it appears between a RTL and a LTR character (or at the end of the string - basically, somewhere where it doesn't have other characters of the same directionality on both sides) then it is considered to be part of whichever run of text matches the predominant direction. 看起来这里发生的事情是,当两个RTL字符之间出现一个结束的方括号字符时,它以RTL格式(您的第一个示例)呈现,但是当它出现在RTL和LTR字符之间时(或者在结束时) string - 基本上,它在两侧没有相同方向性的其他字符的某个地方)然后它被认为是与主要方向匹配的任何文本行的一部分。 If you try dragging your mouse over the string to select the characters you'll see that logically the closing ] still follows the ֶם even if visually it appears to have moved. 如果您尝试将鼠标拖到字符串上以选择字符,您将看到逻辑上关闭]仍然遵循ֶם即使在视觉上它似乎已移动。

If the second-to-last character in your string were also a Hebrew character (or other strongly RTL character) rather than a ] , or if the display context was predominantly RTL, then it would appear where you expect it to. 如果字符串中倒数第二个字符也是希伯来字符(或其他强RTL字符)而不是a ] ,或者显示上下文主要是RTL,那么它将出现在您期望的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM