简体   繁体   English

DeepLearning4j k-means 非常慢

[英]DeepLearning4j k-means very slow

I'm trying to use DL4J's K-Means implementation.我正在尝试使用 DL4J 的 K-Means 实现。 I set it up as follows:我设置如下:

int CLUSTERS = 5;
int MAX_ITERATIONS = 300;
String DISTANCE_METRIC = "cosinesimilarity";
KMeansClustering KMEANS = KMeansClustering.setup(CLUSTERS, MAX_ITERATIONS, DISTANCE_METRIC);

My data points are vectors of size 300 (doubles), and my test set is comprised of ~ 100 data points each time (give or take).我的数据点是大小为 300(双倍)的向量,我的测试集每次都包含约 100 个数据点(给予或接受)。 I'm running it on my CPU (4 cores) in a single threaded fashion.我以单线程方式在我的 CPU(4 核)上运行它。

Evaluation takes a very long time (a few seconds per example).评估需要很长时间(每个示例几秒钟)。

I took a peek inside the algorithm's implementation and it looks like its concurrency level is very high - a lot of threads are being created (one per data point, to be exact) and executed in parallel.我看了一下算法的实现,看起来它的并发级别非常高 - 正在创建很多线程(准确地说是每个数据点一个)并并行执行。 Perhaps this is an overkill?也许这是一种矫枉过正? Is there any way I can control it through configuration?有什么办法可以通过配置来控制它吗? Other ways to speed it up?其他方法可以加快速度吗? If not, is there any other fast java-based solution for executing k-means?如果没有,是否还有其他基于 Java 的快速解决方案来执行 k-means?

"DL4J supports GPUs and is compatible with distributed computing software such as Apache Spark and Hadoop." “DL4J 支持 GPU,并兼容分布式计算软件,如 Apache Spark 和 Hadoop。” from https://deeplearning4j.org来自https://deeplearning4j.org

Extra Spark or Hadoop instance might help for scaling performance.额外的 Spark 或 Hadoop 实例可能有助于扩展性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM