从 Pandas multiIndex 获取数据

Question

I am using pandas and uproot to read data from a .root file, and I get a table like the following one:我正在使用Pandas和uproot从 .root 文件中读取数据，我得到一个如下所示的表：

table桌子

So, from my .root file I have got some branches of a tree.所以，从我的 .root 文件中，我得到了一些树的分支。

fname = 'ZZ4lAnalysis_VBFH.root' 
key = 'ZZTree/candTree'
ttree = uproot.open(fname)[key]
branches = ['nCleanedJets', 'JetPt', 'JetMass', 'JetPhi'] 
df = ttree.pandas.df(branches, entrystop=40306)

Essentially, I have to retrieve "JetPhi" data for each entry in which there are more than 2 subentries (or equivalently, entries for which "nCleanedJets" is equal or greater than 2), calculating the difference of "JetPhi" between the first two subentries and then make a histogram for such differences.本质上，我必须为每个条目检索“JetPhi”数据，其中有超过 2 个子条目（或等效地，“nCleanedJets”等于或大于 2 的条目），计算“JetPhi”在前两个之间的差异子条目，然后为这些差异制作直方图。

I have tried to look up in the internet and tried different possibilities but I have not found any useful solution.我试图在互联网上查找并尝试了不同的可能性，但我没有找到任何有用的解决方案。 If someone could give me any hint, advice and/or suggestion, I would be very grateful.如果有人能给我任何提示、建议和/或建议，我将不胜感激。 I used to code in C++ and I am new to python.我曾经用 C++ 编码，我是 python 的新手。

I used to code in C++, so I am new to python and I do not still master this language.我曾经用 C++ 编码，所以我是 python 的新手，我仍然不掌握这门语言。

Answer 1

You can do this in Pandas with你可以在 Pandas 中做到这一点

df[df["nCleanedJets"] >= 2]

because you have a column with the number of entries.因为您有一列包含条目数。 The df["nCleanedJets"] >= 2 expression returns a Series of booleans ( True if a row passes, False if a row doesn't pass) and passing a Series or NumPy array as a slice in square brackets masks by that array (returning rows for which the boolean array is True ).的df["nCleanedJets"] >= 2表达式返回一个Series的布尔值（ True如果行传递， False如果行不通过），并传递一个Series由阵列或NumPy的阵列在方括号中的掩模的切片（返回布尔数组为True ）。

You could also do this in Awkward Array before converting to Pandas, which would be easier if you didn't have a "nCleanedJets" column.您也可以在转换为 Pandas 之前在 Awkward Array 中执行此操作，如果您没有"nCleanedJets"列，这会更容易。

array = ttree.arrays(branches, entrystop=40306)
selected = array[array.counts >= 2]

awkward.topandas(selected, flatten=True)

Masking in Awkward Array follows the same principle, but with data structures instead of flat Series or NumPy arrays (each element of array is a list of records with "nCleanedJets" , "JetPt" , "JetPhi" , "JetMass" fields, and counts is the length of each list).在尴尬阵列掩蔽遵循相同的原理，但与数据结构，而不是平坦的Series或NumPy的阵列（每个元件array是记录与列表"nCleanedJets" ， "JetPt" ， "JetPhi" ， "JetMass"字段和counts是每个列表的长度）。

awkward.topandas with flatten=True is equivalent to what uproot does when outputtype=pandas.DataFrame and flatten=True (defaults for ttree.pandas.df ).当outputtype=pandas.DataFrame和flatten=True （ ttree.pandas.df默认值）时，带有flatten=True awkward.topandas等效于ttree.pandas.df 。

从 Pandas multiIndex 获取数据

问题描述

1 个解决方案

解决方案1
1 2020-02-04 13:46:59

从 Pandas multiIndex 获取数据

问题描述

1 个解决方案

解决方案1 1 2020-02-04 13:46:59

解决方案1
1 2020-02-04 13:46:59