简体   繁体   中英

How to deal with parsing an arbitrary number of lists into a dictionary

I am parsing an XMI/XML data structure into a pandas dataframe by first decomposing it into a dictionary. When I encounter a named tuple in a list in my XMI, there appear to be a maximum of two named tuples in my list (although the majority only have one).

To handle this case, I am doing the following:

if val is not None and val:
    if len(val) == 1:
        d['modifiedBegin'] = val[0].begin
        d['modifiedEnd'] = val[0].end
        d['modifiedBegin1'] = None
        d['modifiedEnd1'] = None
    else:
        d['modifiedBegin1'] = val[1].begin
        d['modifiedEnd1'] = val[1].end

My issues with this are: a) I cannot be guaranteed that there are only two lists in my list that I am decomposing, and b) this feels cheap, ugly and just plain wrong!

I really would like to come up with a more general solution, especially given item a) above.

My data look like:

val = [Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')]

I would really much rather parse this out into two separate rows in my dataframe, instead of having more columns. The major issue is that both of these tuples share common data from a larger object that looks like:

Negation(xmiID=142613, id=None, negType='nega', negTrigger='without', modifier=[Span(xmiID=105682, begin=13352, end=13358, type='org.metamap.uima.ts.Span'), Span(xmiID=105685, begin=13368, end=13374, type='org.metamap.uima.ts.Span')]) 

So, both rows share the attributes negType and negTrigger ... what is a more general way of decomposing this to insert into my dataframe. I though of iterating through the elements when the length of the list ws greater than one and then inserting into the datframe on each iteration, but that seems messy.

My desired outcome would thus be to have a dataframe that looks like (minus the indices and other common junk):

在此处输入图像描述

  • Iterate over Negation namedtuples
    • for each thing in negation.modifier
      • add a row using the negation attributes and the things attributes

Or instead of parsing XML to namedtuples to dictionaries skip the middle part and create a single dictionary - {'begin':[row0,row1,...],'end':[row0,row1,...],'negtrigger':[row0,row1,...],'negtype':[row0,row1,...]} - from the XML

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM