繁体   English   中英

Python:拆分字符串,使得每个 substring 都是字典中的键

[英]Python: Split string such that each substring is a key in a dictionary

我有一个示例字符串:

"green apple, sly fox, cunning quick fox fur, cool water, yellow sand"

和一本字典:

strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior", "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}

我想将字符串中的子字符串及其字典中的值显示为 dataframe。 这就是我所做的:

    import pandas as pd

    sample_str = "green apple, sly fox, cunning quick fox fur, cool water, yellow sand"
    strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior", "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}

    df_list = []
    stripped_list = [i.strip() for i in sample_str.split(',')]
    
    for i in stripped_list:
      if i in strr_dict:
        df_list.append([i, strr_dict[i]])
      else:
        for j in i.split(): 
          if j in strr_dict:
              df_list.append([j, strr_dict[j]])
          else:
            df_list.append([j, ""])
    
    strr_df = pd.DataFrame(df_list, columns=['Text', 'Value'])
    print(strr_df)

我得到的 output 是:

             Text      Value
    0        green     color
    1        apple     fruit
    2          sly     behavior
    3          fox     animal
    4      cunning     behavior
    5        quick          
    6          fox     animal
    7          fur          
    8   cool water     drink
    9       yellow     color
    10        sand     matter

我想要的 output 是:

             Text      Value
    0        green     color
    1        apple     fruit
    2          sly     behavior
    3          fox     animal
    4      cunning     behavior
    5    quick fox     animal
    6          fur          
    7   cool water     drink
    8       yellow     color
    9         sand     matter

如果子字符串与字典键完全匹配,我想显示这些值。 我想知道如何相应地拆分字符串。 在这种情况下, cunning quick fox fur应该拆分为cunning , quick fox , fur 但这可能并非总是如此,有时应该将其拆分为cunningquick fox fur ,以从字典中获取它们的值。 我对如何处理这种情况感到非常困惑。

所以这确实给出了您指定的 output。 我不知道你如何以及为什么想要这个,我不知道这是否适用于你可能拥有的其他输入案例,但它应该 - 随意使用你准备好的任何其他可怕的数据集进行测试。

import pandas as pd

sample_str = "green apple, sly fox, cunning quick fox fur, cool water, yellow sand"
strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior",
             "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}

df_list = []
stripped_list = [i.strip() for i in sample_str.split(',')]


checklist = []

for i in stripped_list:
    if i in strr_dict:
        df_list.append([i, strr_dict[i]])
        checklist.append(i)
    else:
        for z in list(strr_dict.keys()):
            if z in str(checklist):
                continue
            if z in i:
                try:
                    df_list.append([i, strr_dict[i]])
                    checklist.append(i)
                except:
                    df_list.append([z, strr_dict[z]])
                    checklist.append(z)
    for x in i.split():
        if x not in str(checklist) and x not in list(strr_dict.keys()):
            df_list.append([x, ""])



strr_df = pd.DataFrame(df_list, columns=['Text', 'Value'])
print(strr_df)

Output:

         Text     Value
0       green     color
1       apple     fruit
2         sly  behavior
3         fox    animal
4     cunning  behavior
5   quick fox    animal
6         fur          
7  cool water     drink
8      yellow     color
9        sand    matter

Process finished with exit code 0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM