简体   繁体   中英

How to strip HTML from a text column in Azure ML Execute Python Script step

如果我在传入的 Azure ML 数据集中有一列字符串类型的数据,其中包含搞砸了我的结果的 HTML 标签,我该如何删除这些标签?

Like this:

def azureml_main(dataframe1 = None, dataframe2 = None):
  dataframe1[1] = dataframe1['text'].str.replace('<[^<]+?>', ' ', case=False)
  return dataframe1,

Remember to precede the Execute Python Script step with Clean Missing Data step and change the action to remove the entire row (if appropriate). This is important because the Execute Python Script step cannot return an empty dataframe . Only you know your data, in this case.

Let me also point out that the Preprocessing Text step allows you to apply a Regular Expression. That is another alternative that might be right for your situation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM