簡體   English   中英

使用python和re庫讀取vcf文件數據

[英]Reading vcf file data using python and re library

這是.vcf文件數據中的內容。

BEGIN:VCARD
VERSION:4.0
N:Muller;CCCIsabella;;;
FN:Muller
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;MEDIATYPE=image/gif:http://www.example.com/dir_photos/my_photo.gif
TEL;TYPE=work,voice;VALUE=uri:tel:+16829185770
REV:20080424T195243Z
END:VCARD

BEGIN:VCARD
VERSION:4.0
N:Mraz;CCCEdwardo;;;
FN:Mraz
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;MEDIATYPE=image/gif:http://www.example.com/dir_photos/my_photo.gif
TEL;TYPE=work,voice;VALUE=uri:tel:+18083155095
REV:20080424T195243Z
END:VCARD

BEGIN:VCARD
VERSION:4.0
N:Reynolds;CCCBrant;;;
FN:Reynolds
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;MEDIATYPE=image/gif:http://www.example.com/dir_photos/my_photo.gif
TEL;TYPE=work,voice;VALUE=uri:tel:+15089473508
REV:20080424T195243Z
END:VCARD

我想要我的數據如下。

data = [{'name': 'Muller','phone': '+16829185770'}, {'name': 'Mraz', 'phone': '+18083155095'}, {'name': 'Reynolds','phone': '+15089473508'}]

但我沒有得到如上所述的數據。 在這種情況下,請幫幫我。 在這里我使用re python包來解決。

import re
file = open('contacts.vcf', 'r')
contacts = []
for line in file:
    name = re.findall('FN:(.*)', line)
    tel = re.findall('tel:(.*)', line)
    nm = ''.join(name)
    tel = ''.join(tel)
    if len(nm) == 0 and len(tel) == 0:
        continue
    data = {'name' : nm, 'phone' : tel}
    contacts.append(data)
print(contacts)

低於結果名稱和電話將有所不同。

[{'name': 'Muller', 'phone': ''}, {'name': '', 'phone': '+16829185770'}, {'name': 'Mraz', 'phone': ''}, {'name': '', 'phone': '+18083155095'}, {'name': 'Reynolds', 'phone': ''}, {'name': '', 'phone': '+15089473508'}]

通常,在調試時通常會在各個位置使用print來找出代碼出問題的位置,這很有用。 例如,如果在tel = ''.join(tel)之后插入print(">",nm,tel) ,則應該獲得以下輸出:

>  
>  
>  
> Muller 
>  
>  
>  
>  +16829185770
>  
>  
>  
>  
>  
>  
> Mraz 
>  
[... continued...]

顯然,這是因為for循環正在文件中的每一而不是每個卡上進行操作(從技術上講,您甚至承認: for line in file:

您可能對使用模塊來解析此文件感興趣(快速的Google提出了vobject包),這將消除對re的需要。 如果您有雄心壯志,則可以手動解析它(對格式不是很熟悉,所以這里有一個現成的示例)。

CARDMATCHER = re.compile(r"""
^                 ## Match at Start of Line (Multiline-flag)
    BEGIN:VCARD   ## Match the string "BEGIN:VCARD" exactly
$                 ## Match the End of Line (Multiline-flag)
.*?               ## Match characters (.) any number of times(*),
                  ## as few times as possible(?), including new-line(Dotall-flag)
^                 ## Match at Start of Line (Multiline-flag)
    END:VCARD     ## Match the string "END:VCARD" exactly
$                 ## Match the End of Line (Multiline-flag)
""", re.MULTILINE|re.DOTALL|re.VERBOSE)

VALUERE = re.compile("""
^(?P<type>[A-Z]+) ## Match Capital Ascii Characters at Start of Line
(?P<sep>:|;)     ## Match a colon or a semicolon
(?P<value>.*)    ## Match all other characters remaining
""", re.VERBOSE)

class MyVCard():
    def __init__(self,cardstring):
        self.info = defaultdict(list)
        ## Iterate over the card like you were doing
        for line in cardstring.split("\n"):
            ## Split Key of line
            match = VALUERE.match(line)
            if match:
                vtype = match.group("type")
                ## Line Values are separated by semicolons
                values = match.group("value").split(";")

                ## Lines with colons appear to be unique values
                if match.group("sep") == ":":
                    ## If only a single value, we don't need the list
                    if len(values) == 1:
                        self.info[vtype] = values[0]
                    else:
                        self.info[vtype] = values

                ## Otherwise (Semicolon sep), the value may not be unique
                else:
                    ## Semicolon seps also appear to have multiple keys
                    ## So we'll use a dict
                    out = {}
                    for val in values:
                        ## Get key,value for each value
                        k,v = val.split("=",maxsplit=1)
                        out[k] = v
                    ## Make sure we havea list to append to
                    self.info[vtype].append(out)
    def get_a_number(self):
        """ Naive approach to getting the number """
        if "TEL" in self.info:
            number = self.info["TEL"][0]["VALUE"]
            numbers = re.findall("tel:(.+)",number)
            if numbers:
                return numbers[0]
        return None

def get_vcards(file):
    """ Use regex to parse VCards into dicts. """
    with open(file,'r') as f:
        finput = f.read()
    cards = CARDMATCHER.findall(finput)
    return [MyVCard(card) for card in cards]

print([{"fn":card.info['FN'], "tel":card.get_a_number()} for card in get_vcards(file)])

同樣,由於我不會查找vcf格式的所有規范,因此我對此代碼不做任何保證,因此建議您使用專門為此設計的模塊。

您可以嘗試以下代碼。

import re
file = open('vcards-2.vcf', 'r')
contacts = []
phone = []
for line in file:
    name = re.findall('FN:(.*)', line)
    nm = ''.join(name)
    if len(nm) == 0:
        continue

    data = {'name' : nm.strip()}
    for lin in file:
        tel = re.findall('pref:(.*)', lin)
        tel = ''.join(tel)

        if len(tel) == 0:
            continue

        tel = tel.strip()
        tel = ''.join(e for e in tel if e.isalnum())
        data['phone'] = tel
        break
    contacts.append(data)

print(contacts)

您將獲得以下優惠

[{'name': 'Muller','phone': '+16829185770'}, {'name': 'Mraz', 'phone': '+18083155095'}, {'name': 'Reynolds','phone': '+15089473508'}]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM