简体   繁体   中英

Is there a way to check if a string is alphanumeric in erlang

I am collecting tweets from twitter using erlang and I am trying to save only the hashtags to a database. However when I'm converting the bitstrings to list-strings all the non-latin-letter tweets converts to strange symbols. Is there any way to check if a string is only containing alphanumeric characters in erlang?

There are three io_lib functions specifically for this:

  • io_lib:printable_list/1
  • io_lib:printable_latin1_list/1
  • io_lib:printable_unicode_list/1

Here is an example of one in use :

-spec show_message(ExParent, Message) -> ok
    when WxParent :: wx:wx_object(),
         Message  :: unicode:chardata() | term().

show_message(WxParent, Message) ->
    Format =
        case io_lib:printable_unicode_list(Message) of
            true  -> "~ts";
            false -> "~tp"
        end,
    Modal = wxMessageDialog:new(WxParent, io_lib:format(Format, [Message])),
    _ = wxMessageDialog:showModal(Modal),
    ok = wxMessageDialog:destroy(Modal).

Check out the io_lib docs: http://www.erlang.org/doc/man/io_lib.html#printable_list-1

Addendum

Because this subject isn't always easy to research in Erlang a related, but slightly broader Q/A might be of interest:

How to check whether input is a string in Erlang?

The easiest way is to use regular expressions.

StringAlphanum = "1234abcZXYM".
StringNotAlphanum = "1ZXYMÄ#kMp&?".

re:run(StringAlphanum, "^[0-9A-Za-z]+$").
>> {match,[{0,11}]}

re:run(StringNotAlphanum, "^[0-9A-Za-z]+$").
>> nomatch

You can easily make a function out of it...

isAlphaNum(String) -> 
    case re:run(String, "^[0-9A-Za-z]+$") of
        {match, _} -> true;
        nomatch    -> false
    end.

But, in my opinion, the better way would be to solve the underlying Problem, the correct interpretation of unicode binary strings.

If you want to represent unicode-characters correctly, do not use binary_to_list . Use the unicode-module instead. Unicode-binary strings can not be interpreted naiveley as binary, the UTF-8 character encoding for example has some special constraints that prevent this. For example: the most significant bit in the first character determines, if it is a multi-byte character.

I took the following example from this site , lets define a UTF8-String:

Utf8String = <<195, 164, 105, 116, 105>>.

Interpreted naiveley as binary it yields:

binary_to_list(Utf8String).
"äiti"

Interpreted with unicode-support:

unicode:characters_to_list(Utf8String, utf8).
"äiti"

for latin chars you can use this function:

is_alpha([Char | Rest]) when Char >= $a, Char =< $z ->
    is_alpha(Rest);
is_alpha([Char | Rest]) when Char >= $A, Char =< $Z ->
    is_alpha(Rest);
is_alpha([Char | Rest]) when Char >= $0, Char =< $9 ->
    is_alpha(Rest);
is_alpha([]) ->
    true;
is_alpha(_) ->
    false.

for other coding, you can add their rang of code and add them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM