简体繁体 English

Apache spark如何处理python多线程问题？

[英]How does Apache spark handle python multithread issues?

原文 2016-06-26 08:17:22 8 1 python/ multithreading/ apache-spark

根据python的GIL，我们不能在CPU绑定进程中使用线程，所以我的问题是Apache Spark如何在多核环境中使用python？

1 个解决方案

Multi-threading python issues are separated from Apache Spark internals. 多线程python问题与Apache Spark内部分离。 Parallelism on Spark is dealt with inside the JVM. Spark上的并行性是在JVM内部处理的。

And the reason is that in the Python driver program, SparkContext uses Py4J to launch a JVM and create a JavaSparkContext. 原因是在Python驱动程序中， SparkContext使用Py4J来启动JVM并创建JavaSparkContext。

Py4J is only used on the driver for local communication between the Python and Java SparkContext objects; Py4J仅用于驱动程序，用于Python和Java SparkContext对象之间的本地通信; large data transfers are performed through a different mechanism. 通过不同的机制执行大数据传输。

RDD transformations in Python are mapped to transformations on PythonRDD objects in Java. Python中的RDD转换映射到Java中的PythonRDD对象上的转换。 On remote worker machines, PythonRDD objects launch Python sub-processes and communicate with them using pipes, sending the user's code and the data to be processed. 在远程工作者计算机上，PythonRDD对象启动Python子进程并使用管道与它们通信，发送用户代码和要处理的数据。

PS: I'm not sure if this actually answers your question completely. PS：我不确定这是否真的完全回答了你的问题。

如何使用多线程为python处理屏幕上的输入和输出 - How to handle the input and output on screen with multithread for python

如何在 Python 中对这个任务进行多线程处理？ - How to Multithread this task in Python?

python如何多线程上传？ - How to multithread uploads in python?

Fortify 是否支持 Python、Scala 和 Apache Spark？ - Does Fortify support Python, Scala, and Apache Spark?

python 3.x中多线程和子进程的问题 - issues with multithread and subprocess in python 3.x

如何使用 Apache 处理 Python 文件 - How to handle Python files with Apache

如何使用Python处理Spark Dataframe中的NullType？ - How to handle NullType in Spark Dataframe using Python?

Apache Spark：如何在Python 3中使用pyspark - Apache Spark: How to use pyspark with Python 3

Apache Spark：如何将Python 3与pySpark一起用于开发 - Apache Spark: How to use Python 3 with pySpark for development

如何在 Apache Spark 中的 Dataframe 上运行 Python 中的正则表达式 - How to run Regex in Python on a Dataframe in Apache Spark

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用多线程为python处理屏幕上的输入和输出 - How to handle the input and output on screen with multithread for python 如何在 Python 中对这个任务进行多线程处理？ - How to Multithread this task in Python? python如何多线程上传？ - How to multithread uploads in python? Fortify 是否支持 Python、Scala 和 Apache Spark？ - Does Fortify support Python, Scala, and Apache Spark? python 3.x中多线程和子进程的问题 - issues with multithread and subprocess in python 3.x 如何使用 Apache 处理 Python 文件 - How to handle Python files with Apache 如何使用Python处理Spark Dataframe中的NullType？ - How to handle NullType in Spark Dataframe using Python? Apache Spark：如何在Python 3中使用pyspark - Apache Spark: How to use pyspark with Python 3 Apache Spark：如何将Python 3与pySpark一起用于开发 - Apache Spark: How to use Python 3 with pySpark for development 如何在 Apache Spark 中的 Dataframe 上运行 Python 中的正则表达式 - How to run Regex in Python on a Dataframe in Apache Spark

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM