簡體   English   中英

如何在Python中解析字符串

[英]How to parse a string in Python

如何解析由n個參數組成並隨機排序的字符串,例如:

{ UserID : 36875;  tabName : QuickAndEasy}
{ RecipeID : 1150;  UserID : 36716}
{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}
{ UserID : 36716;  tabName : QuickAndEasy}

最終,我希望將參數輸出到表的不同列中。

正則表達式([^{}\\s:]+)\\s*:\\s*([^{}\\s;]+)適用於您的示例。 但是,您需要知道所有匹配項都是字符串,因此,如果要將36875存儲為數字,則需要進行一些其他處理。

import re
regex = re.compile(
    r"""(        # Match and capture in group 1:
     [^{}\s:]+   # One or more characters except braces, whitespace or :
    )            # End of group 1
    \s*:\s*      # Match a colon, optionally surrounded by whitespace
    (            # Match and capture in group 2:
     [^{}\s;]+   # One or more characters except braces, whitespace or ;
    )            # End of group 2""", 
    re.VERBOSE)

然后你可以做

>>> dict(regex.findall("{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}"))
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'}

在regex101.com上進行實時測試。

lines =  "{ UserID : 36875;  tabName : QuickAndEasy } ",  \
         "{ RecipeID : 1150;  UserID : 36716}",  \
         "{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}" , \
         "{ UserID : 36716;  tabName : QuickAndEasy}"

counter = 0

mappedLines = {}

for line in lines:
    counter = counter + 1
    lineDict = {}
    line = line.replace("{","")
    line = line.replace("}","")
    line = line.strip()
    fieldPairs = line.split(";")

    for pair in fieldPairs:
        fields = pair.split(":")
        key = fields[0].strip()
        value = fields[1].strip()
        lineDict[key] = value

    mappedLines[counter] = lineDict

def printField(key, lineSets, comma_desired = True):
    if key in lineSets:
        print(lineSets[key],end="")
    if comma_desired:
        print(",",end="")
    else:
        print()

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printField("UserID",lineSets)
    printField("tabName",lineSets)
    printField("RecipeID",lineSets)
    printField("type",lineSets)
    printField("searchWord",lineSets)
    printField("isFromLabel",lineSets,False)

CSV輸出:

36875,QuickAndEasy,,,,
36716,,1150,,,
36716,,,recipe,soup,0
36716,QuickAndEasy,,,,

上面的代碼是Python 3.4。 通過將函數和最后一個for循環替換為2.7,可以得到與2.7類似的輸出:

def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
        if key in lineSets:
            output_line = output_line + lineSets[key] + ","
        else:
            output_line += ","
    print output_line[0:len(output_line) - 1]

fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printFields(fields,lineSets)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM