簡體   English   中英

從數據中提取特定信息

[英]Extracting specific information from data

我如何轉換數據格式,如:

James Smith was born on November 17, 1948

變成類似的東西

("James Smith", DOB, "November 17, 1948")

無需依賴字符串的位置索引

我已經嘗試了以下

from nltk import word_tokenize, pos_tag

new = "James Smith was born on November 17, 1948"
sentences = word_tokenize(new)
sentences = pos_tag(sentences)
grammar = "Chunk: {<NNP*><NNP*>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentences)
print(result)

如何進一步獲取所需的 fromat 輸出。

在修剪空格並分配給 name 和 dob 之后,用“出生於”拆分字符串

你總是可以使用正則表達式。 正則表達式(\\S+)\\s(\\S+)\\s\\bwas born on\\b\\s(\\S+)\\s(\\S+),\\s(\\S+)將匹配並返回來自上述字符串格式的數據.

實際操作如下: https : //regex101.com/r/W2ykKS/1

python中的正則表達式:

import re

regex = r"(\S+)\s(\S+)\s\bwas born on\b\s(\S+)\s(\S+),\s(\S+)"
test_str = "James Smith was born on November 17, 1948"

matches = re.search(regex, test_str)

# group 0 in a regex is the input string

print(matches.group(1)) # James
print(matches.group(2)) # Smith
print(matches.group(3)) # November
print(matches.group(4)) # 17
print(matches.group(5)) # 1948

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM