Pandas：如何將 cProfile 輸出存儲在 Pandas DataFrame 中？

Question

目前已經存在一些帖子討論使用Python CPROFILE分析，以及分析輸出由於從下面的示例代碼的輸出文件restats不是一個純文本文件的挑戰。 下面的代碼片段只是來自docs.python.org/2/library/profile 的一個示例，不可直接復制。

import cProfile
import re
cProfile.run('re.compile("foo|bar")', 'restats')

這里有一個討論： Profile a python script using cProfile into an external file ，在docs.python.org上有更多關於如何使用pstats.Stats分析輸出的詳細信息（仍然只是一個示例，不可重現）：

import pstats
p = pstats.Stats('restats')
p.strip_dirs().sort_stats(-1).print_stats()

我可能在這里遺漏了一些非常重要的細節，但我真的很想將輸出存儲在Pandas DataFrame 中並從那里做進一步的分析。

我認為這會很簡單，因為在 iPython 中運行cProfile.run()的輸出看起來相當整潔：

In[]:
cProfile.run('re.compile("foo|bar")'

Out[]:

關於如何以相同格式將其放入 Pandas DataFrame 的任何建議？

Answer 1

看起來https://github.com/ssanderson/pstats-view可能會做你想做的事（盡管有與可視化數據和使其交互相關的不必要的依賴）：

>>> from pstatsviewer import StatsViewer
>>> sv = StatsViewer("/path/to/profile.stats")
>>> sv.timings.columns
Index(['lineno', 'ccalls', 'ncalls', 'tottime', 'cumtime'], dtype='object')

Answer 2

我知道這已經有了答案，但對於不想費力下載另一個模塊的人來說，這里有一個粗略的准備好的腳本，應該接近：

%%capture profile_results    ## uses %%capture magic to send stdout to variable
cProfile.run("your_function( **run_parms )")

首先運行上面的程序，用 stout 的內容填充profile_results ，其中包含cProfile的通常打印輸出。

## Parse the stdout text and split it into a table
data=[]
started=False

for l in profile_results.stdout.split("\n"):
    if not started:
        if l=="   ncalls  tottime  percall  cumtime  percall filename:lineno(function)":
            started=True
            data.append(l)
    else:
        data.append(l)
content=[]
for l in data:
    fs = l.find(" ",8)
    content.append(tuple([l[0:fs] , l[fs:fs+9], l[fs+9:fs+18], l[fs+18:fs+27], l[fs+27:fs+36], l[fs+36:]]))
prof_df = pd.DataFrame(content[1:], columns=content[0])

它不會因為優雅或令人愉悅的風格而贏得任何獎項，但它確實會將該結果表強制轉換為可過濾的數據幀格式。

prof_df

Answer 3

如果您在 cmd 中使用python -m cProfile your_script.py執行此操作

您可以將輸出推送到 csv 文件，然后使用 pandas python -m cProfile your_script.py >> output.txt進行解析

然后用pandas解析輸出

df = pd.read_csv('output.txt', skiprows=5, sep='    ', names=['ncalls','tottime','percall','cumti    me','percall','filename:lineno(function)'])
df[['percall.1', 'filename']] = df['percall.1'].str.split(' ', expand=True, n=1)
df = df.drop('filename:lineno(function)', axis=1)

Answer 4

您可以使用此功能來完成此任務

def convert_to_df(path, offset=6):
    """
    path: path to file
    offset: line number from where the columns start
    """
    with open(path, "r") as f:
        core_profile = f.readlines()
    core_profile = core_profile[offset:]
    cols = core_profile[0].split()
    n = len(cols[:-1])
    data = [_.split() for _ in core_profile[1:]]
    data = [_ if len(_)==n+1 else _[:n]+[" ".join(_[n+1:])] for _ in data]
    data_ = pd.DataFrame(data, columns=cols)
    return data_

Answer 5

如果人們不想使用 %%capture 或通過 CSV，在拼湊的解決方案下方，在這種情況下，通過 (1) 按累積時間對每個 cProfile 進行排序和 (2) 添加來比較同一文件夾中的多個 cProfile只有從每個 .prof 到數據框的最高結果（ pstats.Stats(f, stream = p_output).sort_stats("cumulative").print_stats(1) ）（以及 .prof 文件名的一部分，以識別哪個配置文件測量來自）。

有關一些原始代碼（確實使用 CSV 作為中介），請參見此處。

import io
import pstats
import pandas as pd
import glob

all_files = glob.glob(profiledir + "/*.prof")

li = []

for f in all_files:
    
    p_output = io.StringIO()

    prof_stats = pstats.Stats(f, stream = p_output).sort_stats("cumulative").print_stats(1)

    p_output = p_output.getvalue()
    p_output = 'ncalls' + p_output.split('ncalls')[-1]
    result = '\n'.join([','.join(line.rstrip().split(None,5)) for line in p_output.split('\n')])

    df = pd.read_csv(io.StringIO(result), sep=",", header=0)
    df['module name'] = f.split(' ')[0].split('\\')[1] # differs depending on your file naming convention
    li.append(df) 

df = pd.concat(li, axis=0, ignore_index=True)

Pandas：如何將 cProfile 輸出存儲在 Pandas DataFrame 中？

問題描述

5 個解決方案

解決方案1
4 已采納 2017-08-31 01:09:13

解決方案2
4 2020-02-18 15:37:36

解決方案3
1 2020-04-21 00:03:52

解決方案4
1 2020-07-08 06:39:03

解決方案5
1 2021-06-09 12:00:56

Pandas：如何將 cProfile 輸出存儲在 Pandas DataFrame 中？

問題描述

5 個解決方案

解決方案1 4 已采納 2017-08-31 01:09:13

解決方案2 4 2020-02-18 15:37:36

解決方案3 1 2020-04-21 00:03:52

解決方案4 1 2020-07-08 06:39:03

解決方案5 1 2021-06-09 12:00:56

解決方案1
4 已采納 2017-08-31 01:09:13

解決方案2
4 2020-02-18 15:37:36

解決方案3
1 2020-04-21 00:03:52

解決方案4
1 2020-07-08 06:39:03

解決方案5
1 2021-06-09 12:00:56