简体   繁体   English

如何去除不是单词字符的所有字符的Ruby字符串?

[英]How to strip Ruby string of all characters that are not word characters?

How do I strip a string in Ruby of all characters that aren't word characters (az, any digit), replacing them with a blank? 如何在Ruby中剥离不是单词字符(az,任何数字)的所有字符的字符串,将其替换为空格?

For instance, for the string "not-using-social-media" I want to strip this to "not using social media" 例如,对于字符串“ not-using-social-media”,我想将其剥离为“不使用社交媒体”

For the string "16 Surprising Small Business Statistics (Infographic)", I want to strip this to "16 Surprising Small Business Statistics Infographic" 对于字符串“ 16个令人惊讶的小型企业统计信息(图表)”,我想将其剥离为“ 16个令人惊讶的小型企业统计信息图表”

This does not use a regex. 这不使用正则表达式。 It replaces everything which is not in "a-zA-Z0-9 " with a space, then squeezes runs of spaces to one space and removes trailing and tailing whitespace. 它用空格替换不在 “ a-zA-Z0-9”中的所有内容,然后将空格行压缩到一个空格,并删除尾随空格。

str = "not-using-social-media 16 Surprising Small Business Statistics (Infographic)"
p str.tr("^a-zA-Z0-9 ", " ").squeeze(" ").strip
#=>"not using social media 16 Surprising Small Business Statistics Infographic"

I would do either: 我会:

phrase = '16 Surprising Small Business Statistics (Infographic)'

p phrase.gsub(/[^a-zA-Z0-9]+/, ' ').strip
#=> "16 Surprising Small Business Statistics Infographic"

p phrase.gsub(/[^[:alnum:]]+/, ' ').strip
#=> "16 Surprising Small Business Statistics Infographic"

A couple of notes: 一些注意事项:

  • The + is added so that consecutive non-alphanumeric characters are replaced with a single space. 添加+ ,以便用单个空格替换连续的非字母数字字符。
  • The .strip is added on the assumption you do not want the leading/trailing spaces created. 假设您不希望创建前导/尾随空格,则添加.strip
  • The regex does not use \\w since that would also include underscores. 正则表达式不使用\\w因为它还会包含下划线。

Regex is your friend - http://www.ruby-doc.org/core-1.9.3/Regexp.html 正则表达式是你的朋友- http://www.ruby-doc.org/core-1.9.3/Regexp.html

This is the bracket expression you'll want - /[[:alpha:]]/ 这是您想要的方括号表达式-/ [[:::]] /

The easiest solution is simply using delete('^') . 最简单的解决方案是使用delete('^') It deletes everything except for what comes after ^ . 它删除除^之后的所有内容。

a='hello-world+'

a.delete('^A-Za-z')  #=> 'helloworld'

a='Hello +World'

a.delete('^A-Za-z ') #=> 'Hello World'

a='01234 ABC'
a.delete('^0-9') #=> '01234'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM