从python的句子中提取单词

Question

I have a dataset in a text/csv file format. 我有一个text / csv文件格式的数据集。 It has 2 columns like this = 它有2列，像这样=

ID - TEXT
1 - this probability is 10-15% 
2 - approximately 20% probablity 
3 - 15% probability

I am trying to use NLTK to extract the number from the data where there is the keyword 'Probability' present. 我正在尝试使用NLTK从存在关键字'Probability'的数据中提取数字。

This is what my code looks like. 这就是我的代码。

import pandas as pd
import nltk
from nltk import sent_tokenize, word_tokenize

data_file = pd.read_excel(r'data_excel.xlsx',sheet_name = 'data')

df = pd.DataFrame(data_file, columns = ['ID','TEXT'])
keywords = ["probability"]

id_text = nltk.Text(str(df.ID).splitlines()) 
text_value = nltk.Text(str(df.TEXT).splitlines())

I want the output to look like this - 我希望输出看起来像这样-

ID - Value 
1 - 10
2 - 20
3 - 15

If someone can nudge in the right direction, it will be very helpful. 如果有人可以向正确的方向轻推，那将非常有帮助。

Answer 1

THIS CODE SHOULD WORK OR AT LEAST POINT YOU INTO SOLVING IT Here is the full code 此代码应该起作用，或者至少可以解决它， 这是完整的代码

import csv
import nltk
impor re
import pandas as pd
from nltk import sent_tokenize, word_tokenize

tweet = []

data_file = pd.read_excel(r'data_excel.xlsx',sheet_name = 'data')
df = pd.DataFrame(data_file, columns = ['ID','TEXT'])


cols = ['ID', 'Num']
newDataFrame = pd.DataFrame(columns=cols)


#this should provide you with a list of both ID and txt
ID = df.iloc[:,0].values
TEXT  = df.iloc[:,1].values


#loop throug the id and set occurence of the number of probability
for i in range(1, len(ID)):
    number_list = re.findall(r'\b\d+\b', TEXT[i])

    newDataFrame.iloc[i].ID = ID
    newDataFrame.iloc[i].Num = number_list

print(newDataFrame)

从python的句子中提取单词

问题描述

1 个解决方案

解决方案1
0 2018-05-31 23:40:18

从python的句子中提取单词

问题描述

1 个解决方案

解决方案1 0 2018-05-31 23:40:18

解决方案1
0 2018-05-31 23:40:18