繁体   English   中英

从 Java 程序启动时 Pandas Profiling 不起作用

[英]Pandas Profiling does not work when started from a Java program

我有一个使用 pandas_profiling 的简单 Python 程序。 这是我存储为 c:\\temp\\pandas_profiling_demo.py 的源代码:

import pandas as pd
import pandas_profiling as pp
df = pd.DataFrame(data={'x': [1, 2, 3, 4, 5], 'y': [2, 2, 4, 6, 6], 'z': [4, 6, 1, 5, 2]})
print(df.head(10))
profile = pp.ProfileReport(df)
profile.to_file(outputfile="C:\\temp\\pandas_profiling_demo.html")
print("Done.")

我还有一个 Java 程序,它启动 Python 程序(这不是真正的程序,它涉及 GUI,但这会重新产生问题。)我的程序在 Eclipse 中,但我将它复制到这里:

package pandasprofilingbug;

import java.io.File;
import java.lang.ProcessBuilder.Redirect;
import java.nio.file.Files;
import java.util.Map;

public class PandasProfilingBug {

    public static void main(String[] args) 
    {
        String[] command = new String[] { "cmd", "/C", "python", "c:\\temp\\pandas_profiling_demo.py" } ;
        ProcessBuilder pb = new ProcessBuilder(command);
        Map<String, String> env = pb.environment();

        env.remove("PYSPARK_DRIVER_PYTHON");        // else attempts to open in notebook environment    

        Process p = null;

        try
        {
            File log = new File("c:\\temp\\pandas_profiling_demo.log");
            Files.deleteIfExists(log.toPath());

            pb.redirectErrorStream(true);
            pb.redirectOutput(Redirect.to(log));
            // System.out.print("Start...");
            p = pb.start();
            assert pb.redirectInput() == Redirect.PIPE;
            assert pb.redirectOutput().file() == log;
            assert p.getInputStream().read() == -1;

            // TODO: How to give user an option to cancel???
            // TODO: How to provide progress report?
            System.out.print("Waiting...");
            p.waitFor();
            System.out.print("waiting over...exitValue = " + p.exitValue());
        }
        catch (Exception ie)
        {
            System.err.println(ie);
            ie.printStackTrace();
        }

    }

当我运行 Java 程序时,它陷入了一个循环。 这是重复输出的第一块:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Users\bill_\Anaconda3\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\bill_\Anaconda3\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\bill_\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\temp\pandas_profiling_demo.py", line 21, in <module>
    profile = pp.ProfileReport(df)
  File "C:\Users\bill_\Anaconda3\lib\site-packages\pandas_profiling\__init__.py", line 66, in __init__
    description_set = describe(df, **kwargs)
  File "C:\Users\bill_\Anaconda3\lib\site-packages\pandas_profiling\describe.py", line 349, in describe
    pool = multiprocessing.Pool(pool_size)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
   x  y  z
0  1  2  4
1  2  2  6
2  3  4  1
3  4  6  5
4  5  6  2
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\bill_\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")

# REPEATED AD NAUSEUM

当我在 Jupyter 笔记本中运行 Python 程序时,它运行良好,创建了所需的 html 文件。

当我注释掉这些行时,从 Java 调用时它可以正常工作(显示数据框):

    #profile = pp.ProfileReport(df)
    #profile.to_file(outputfile="C:\\temp\\pandas_profiling_demo.html")

由于如果不使用分析,我可以从 Java 运行程序,我怀疑 pandas_profiling 有问题(或者至少对我来说有问题。)为什么它会导致程序进入循环?

提前致谢。

您可以在此处此处找到解决方案。

在你的情况下:

import pandas as pd
import pandas_profiling as pp

if __name__ == "__main__":
    df = pd.DataFrame(data={'x': [1, 2, 3, 4, 5], 'y': [2, 2, 4, 6, 6], 'z': [4, 6, 1, 5, 2]})
    print(df.head(10))
    profile = pp.ProfileReport(df)
    profile.to_file(outputfile="C:\\temp\\pandas_profiling_demo.html")
    print("Done.")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM