简体   繁体   中英

Get data from Pandas multiIndex

I am using pandas and uproot to read data from a .root file, and I get a table like the following one:

table

So, from my .root file I have got some branches of a tree.

fname = 'ZZ4lAnalysis_VBFH.root' 
key = 'ZZTree/candTree'
ttree = uproot.open(fname)[key]
branches = ['nCleanedJets', 'JetPt', 'JetMass', 'JetPhi'] 
df = ttree.pandas.df(branches, entrystop=40306)

Essentially, I have to retrieve "JetPhi" data for each entry in which there are more than 2 subentries (or equivalently, entries for which "nCleanedJets" is equal or greater than 2), calculating the difference of "JetPhi" between the first two subentries and then make a histogram for such differences.

I have tried to look up in the internet and tried different possibilities but I have not found any useful solution. If someone could give me any hint, advice and/or suggestion, I would be very grateful. I used to code in C++ and I am new to python.

I used to code in C++, so I am new to python and I do not still master this language.

You can do this in Pandas with

df[df["nCleanedJets"] >= 2]

because you have a column with the number of entries. The df["nCleanedJets"] >= 2 expression returns a Series of booleans ( True if a row passes, False if a row doesn't pass) and passing a Series or NumPy array as a slice in square brackets masks by that array (returning rows for which the boolean array is True ).

You could also do this in Awkward Array before converting to Pandas, which would be easier if you didn't have a "nCleanedJets" column.

array = ttree.arrays(branches, entrystop=40306)
selected = array[array.counts >= 2]

awkward.topandas(selected, flatten=True)

Masking in Awkward Array follows the same principle, but with data structures instead of flat Series or NumPy arrays (each element of array is a list of records with "nCleanedJets" , "JetPt" , "JetPhi" , "JetMass" fields, and counts is the length of each list).

awkward.topandas with flatten=True is equivalent to what uproot does when outputtype=pandas.DataFrame and flatten=True (defaults for ttree.pandas.df ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM