简体   繁体   English

用于匹配两组字符串之间模式的机器学习模型?

[英]A machine learning model for matching pattern between two sets of strings?

I am trying to learn HTML transformations performed by a certain service using machine learning. 我正在尝试使用机器学习来学习某个服务执行的HTML转换。 I have broken down my problem into a pattern matching problem. 我已将问题分解为模式匹配问题。 For now I am trying to learn pattern in which tags are transformed. 现在我正在尝试学习标签转换的模式。 For example, for same data I have this pattern in original HTML "html, body, div, h1" and following pattern in transformed page "html, body, div, div, div". 例如,对于相同的数据,我在原始HTML“html,body,div,h1”中使用此模式,并在转换后的页面“html,body,div,div,div”中使用以下模式。 I have 14000 such data points and I want to train a model that would take as input patterns from original page and output transformed patterns. 我有14000个这样的数据点,我想训练一个模型,该模型将从原始页面输出模式并输出转换后的模式。 I have looked into a few NLP model but either I have failed to understand them completely or they were not very helpful. 我已经研究了一些NLP模型,但要么我完全没有理解它们,要么它们没有帮助。 If someone could give me any pointers or preferably suggest some python based model that would be great. 如果有人可以给我任何指针或者最好建议一些基于python的模型,那将是很棒的。

your question is not clear enough to help you with some answer but still from what I was able to figure out your input will be html tags in a string pattern & your output too is a string pattern of html tags. 你的问题不够清楚,无法帮助你得到一些答案,但仍然从我能够弄清楚你的输入将是字符串模式的html标签,你的输出也是html标签的字符串模式。

You can use a bi-directional LSTM or CRF for this kind of task. 您可以使用双向LSTM或CRF执行此类任务。 Read about them and you'll have a clear idea. 阅读它们,你会有一个清晰的想法。

But if same input pattern is giving multiple output pattern then it will be difficult for most ML algos to learn. 但是如果相同的输入模式给出多个输出模式,那么大多数ML算法将难以学习。 You can remove those data points and you'll be good to go. 你可以删除那些数据点,你会很高兴。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM