简体   繁体   English

Python短语标签

[英]Python phrase labelling

Suppose I have a sentence like sent = "safety cited many for one willful safety violation for failing to provide and ensure the use of fall protection for workers atop railcars because many workers died." 假设我有一个句子,如send =“安全引用许多人是因为故意故意违反安全规定,因为许多工人死亡,未能提供并确保在铁路车辆上为工人提供防坠落保护。”

vio="safety violation for failing to provide and ensure the use of fall protection for workers atop railcars" vio =“违反安全规定,原因是未能为铁路车顶上的工人提供并确保其坠落保护的使用”

inc="workers died." inc =“工作人员死亡。”

resulting output should be : 结果输出应为:

safety_NONE cited_NONE many_NONE for_NONE one_NONE willful_NONE safety_VIO violation_VIO for_VIO failing_VIO to_VIO provide_VIO and_VIO ensure_VIO the_VIO use_VIO of_VIO fall_VIO protection_VIO for_VIO workers_VIO atop_VIO railcars_VIO because_NONE many_NONE workers_INC died_INC ._INC safety_NONE被引用_NONE many_NONE for_NONE one_NONE willful_NONE safety_VIO违规_VIO for_VIO失败_VIO至_VIO Provide_VIO和_VIO确保_VIO the_VIO use_VIO of_VIO fall_VIO保护_VIO for_VIO worker_VIO atop_NIO_INION

Please let me know the python script which will help me get this output. 请让我知道python脚本,它将帮助我获得此输出。

vio = re.findall(r"[\w']+|[.,!?;]", vio)
inc = re.findall(r"[\w']+|[.,!?;]", inc)

sent = re.findall(r"[\w']+|[.,!?;]", sent)

labels = {"VIO": vio,
          "INC": inc}
labelled = []
for w in sent:
    label = "_NONE"
    for l, criteria in labels.items():
        if w in criteria:
            label = "_"+l
    labelled.append(w + label)
result = " ".join(labelled)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM