[英]DeepLearning4j k-means very slow
I'm trying to use DL4J's K-Means implementation.我正在尝试使用 DL4J 的 K-Means 实现。 I set it up as follows:
我设置如下:
int CLUSTERS = 5;
int MAX_ITERATIONS = 300;
String DISTANCE_METRIC = "cosinesimilarity";
KMeansClustering KMEANS = KMeansClustering.setup(CLUSTERS, MAX_ITERATIONS, DISTANCE_METRIC);
My data points are vectors of size 300 (doubles), and my test set is comprised of ~ 100 data points each time (give or take).我的数据点是大小为 300(双倍)的向量,我的测试集每次都包含约 100 个数据点(给予或接受)。 I'm running it on my CPU (4 cores) in a single threaded fashion.
我以单线程方式在我的 CPU(4 核)上运行它。
Evaluation takes a very long time (a few seconds per example).评估需要很长时间(每个示例几秒钟)。
I took a peek inside the algorithm's implementation and it looks like its concurrency level is very high - a lot of threads are being created (one per data point, to be exact) and executed in parallel.我看了一下算法的实现,看起来它的并发级别非常高 - 正在创建很多线程(准确地说是每个数据点一个)并并行执行。 Perhaps this is an overkill?
也许这是一种矫枉过正? Is there any way I can control it through configuration?
有什么办法可以通过配置来控制它吗? Other ways to speed it up?
其他方法可以加快速度吗? If not, is there any other fast java-based solution for executing k-means?
如果没有,是否还有其他基于 Java 的快速解决方案来执行 k-means?
"DL4J supports GPUs and is compatible with distributed computing software such as Apache Spark and Hadoop." “DL4J 支持 GPU,并兼容分布式计算软件,如 Apache Spark 和 Hadoop。” from https://deeplearning4j.org
来自https://deeplearning4j.org
Extra Spark or Hadoop instance might help for scaling performance.额外的 Spark 或 Hadoop 实例可能有助于扩展性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.