简体   繁体   中英

Convert list into sub-list while maintaining “key”

I have a list that contains a 'key' and 'paragraph'. Each 'key' is associated with a 'paragraph'.

My goal is to break each paragraph into individual sentences, with each sentence being assign to the 'key' they originally belonged to in paragraph form. For example:

(['2925729', 'Patrick came outside and greeted us promptly.'], ['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.'], ['2925728', 'Patrick sucks. He farted politely, but it seemed routine.'])

Right now I've been able to write code to break out sentences into paragraphs, and get the number of hits for each sentence against a dictionary. I now want to associate an ID to each question.

Here is the code that deals with sentences without any 'key'. Step1 and 2 I omitted for space conservation:

Dictionary = ['book', 'should have', 'open']

####Step3#####
#Create Blank list to append final output
final_out = []

##Find Matches
for sent in sentences:
  for sent in sentences:
      final_out.append((sent, sum(sent.count(col) for col in dictionary)))

#####Spit out final distinct output
##Output in dictionary structure
final_out = dict(sorted(set(final_out)))

####Get sentences and rank by max first

import operator
sorted_final_out = sorted(final_out.iteritems(),key = operator.itemgetter(1), reverse = True)

The output from this is: (['johny ate the antelope', 80], ['sally has a friend',20]) and so on. I then pick the top X b magnitude. What I am now trying to achieve is something like this: (['12222','johny ate the antelope', 80], [22332,'sally has a friend',20]) . So i basically want to ensure that all sentences when parsed out are assigned to a 'key'. This is complicated sorry. Again that is why John's earlier solution would work on a simpler case.

from itertools import chain
list(chain(*[[[y[0],z] for z in y[1].split('. ')] for y in x]))

produces

[['2925729', 'Patrick came outside and greeted us promptly.'],
 ['2925729', 'Patrick did not shake our hands nor ask our names'],
 ['2925729', 'He greeted us promptly and politely, but it seemed routine.'],
 ['2925728', 'Patrick sucks'],
 ['2925728', 'He farted politely, but it seemed routine.']]

list(chain(*...)) flattens the nested list produced by [[[y[0],z] for z in y[1].split('. ')] for y in x] .

If you'd rather change list 'in place' you could use

xl = list(x) # you gave us a tuple          
for i,y in enumerate(xl):
    xx = xl[i]
    xx = [[xx[0],y] for y in xx[1].split('. ')]
    xl[i:i+1] = xx

I'm not sure which would work faster or better when the data set is very large.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM