繁体   English   中英

如何删除一定长度的字符串?

[英]How to remove a string of a certain length?

我有 pandas.core.series.Series 文本。 我只想保留长度小于 512 的句子。

0        Lebanon, officially known as the Republic of Lebanon, is a country in Western Asia. It is bordered by Syria to the north and east and Israel to the south, while Cyprus lies west across the Mediterranean Sea. Lebanon's location at the crossroads of the Mediterranean Basin and the Arabian hinterland has contributed to its rich history and shaped a cultural identity of religious and ethnic diversity At just 10,452 km2 (4,036 mi2), it is the smallest recognized sovereign state on the mainland Asian continent The earliest evidence of civilization in Lebanon dates back more than seven thousand years, predating recorded history.
1        Lebanon was home to the Phoenicians, a maritime culture that flourished for almost three thousand years (c. 3200–539 BC). In 64 BC, the region came under the rule of the Roman Empire, and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a onastic tradition known as the Maronite Church. As the Arab Muslims conquered the region, the Maronites held onto their religion and identity. However, a new religious group, the Druze, established themselves in Mount Lebanon as well, generating a religious divide that has lasted for centuries. During the Crusades, the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.

然后我想删除如果有 len(sentence) > 512。所以,output 将是:

0        Lebanon, officially known as the Republic of Lebanon, is a country in Western Asia. It is bordered by Syria to the north and east and Israel to the south, while Cyprus lies west across the Mediterranean Sea.
1        Lebanon was home to the Phoenicians, a maritime culture that flourished for almost three thousand years (c. 3200–539 BC). In 64 BC, the region came under the rule of the Roman Empire, and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a monastic tradition known as the Maronite Church. As the Arab Muslims conquered the region, the Maronites held onto their religion and identity.       However, a new religious group, the Druze, established themselves in Mount Lebanon as well, generating a religious divide that has lasted for           centuries. During the Crusades, the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.

我可以使用此代码吗? 感谢您的帮助。

remove = [x for x in text if len(x) < 512]

我已经尝试过Python: Pandas filter string data based on its string length的解决方案,但情况不同。

将列表推导与splitjoin by lenght 一起使用:

L = ['.'.join(y for y in x.split('.') if len(y) < 512) for x in s]
s = pd.Series(L, index=s.index)    

或者使用自定义 function 和Series.apply

s = s.apply(lambda x: '.'.join(y for y in x.split('.') if len(y) < 512))  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM