简体   繁体   中英

How to perform regex operations on a string tensor on TensorFlow?

How can I perform regex operations on a string tensor? Normally, I would just use a python string but when using Tensorflow Serving, I need my input to be a string tensor. So I created a string placeholder and am just injecting another layer into the graph where I take the placeholder and make it ready for the passing it to the model.

I have looked at using py_func but I still cannot perform pattern operations on a bytes-like object.

Is there any way of performing these operations on a tensor? I cannot do an eval() on the placeholder because the data is only fed in when the savedModel is loaded and run.

Code I have been using for testing:

def remove_urls(vTEXT):
    vTEXT = re.sub(r'(https|http)?:\/\/(\w|\.|\/|\?|\=|\&|\%)*\b', 'url', vTEXT, flags=re.MULTILINE)
    return(vTEXT)


input_string_ph = tf.constant("This is string https:www.someurl.com")

input_string_lower = tf.py_func(lambda x: x.lower(), [input_string_ph], tf.string, stateful=False)
# input_string_no_url = tf.py_func(lambda x: remove_urls(x), [input_string_lower], tf.string, stateful=False)
sess = tf.InteractiveSession()
print (input_string_no_url.eval())

it seems that the String tensor return a byte value instead of string value in py_func , so inside remove_urls , you should use decode

def remove_urls(vTEXT):
    vTEXT = vTEXT.decode('utf-8')
    vTEXT = re.sub(r'(https|http)?:\/\/(\w|\.|\/|\?|\=|\&|\%)*\b', 'url', vTEXT, flags=re.MULTILINE)
    return(vTEXT)

Eg you can remove a sub-string from a string and check if you succeeded, using the tf.regex_replace() operator like this:

import tensorflow as tf

str = tf.constant("your string")
sub_str = tf.constant("string")

def not_contains(str1, str2):
    cut1 = tf.regex_replace(str1, str2, "")
    split1 = tf.string_split([cut1], "")
    split2 = tf.string_split([str1], "")
    size1 = tf.size(split1)
    size2 = tf.size(split2)
    return tf.equal(size1, size2)

is_not_in = not_contains(str, sub_str)

sess = tf.Session()
sess.run(is_not_in) # False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM