OPUS 4 | Search

3 search hits

1 to 3

Sort by

A generic and adaptive approach for workload distribution in multitier cluster systems with an application to distributed matrix multiplication (2015)

Handmann, Uwe ; Kopinski, Thomas ; Malysiak, Darius

We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a cluster system. In this context we discuss the induced challenges and possible solutions. Additionally we state an algorithm which outperforms optimized GPU BLAS libraries for small matrices. Furthermore we provide a novel theoretical model for distributing algorithms within homogeneous computation systems with multiple hierarchies. In the context of this model we develop an algorithm which can find the optimal distribution parameters for each involved subalgorithm. We provide a detailed analysis of the algorithms space and time complexities and justify its use with a structured evaluation within a small GPU-equipped Beowulf cluster.

Increasing the efficiency of gpu-based hog algorithms through tile-images (2015)

Malysiak, Darius ; Markard, Markus ; Handmann, Uwe

Object detection systems which operate on large data streams require an efficient scaling with available computation power. We analyze how the use of tile-images can increase the efficiency (i.e. execution speed) of distributed HOG-based object detectors. Furthermore we discuss the challenges of using our developed algorithms in practical large scale scenarios. We show with a structured evaluation that our approach can provide a speed-up of 30-180 % for existing architectures. Due to the its generic formulation it can be applied to a wide range of HOG-based (or similar) algorithms. In this context we also study the effects of applying our method to an existing detector and discuss a scalable strategy for distributing the computation among nodes in a cluster system.

On the challenge of training small scale neural networks on large scale computing systems (2015)

Malysiak, Darius ; Grimm, Matthias ; Handmann, Uwe

We present a novel approach of distributing small-to mid-scale neural networks onto modern parallel architectures. In this context we discuss the induced challenges and possible solutions. We provide a detailed theoretical analysis with respect to space and time complexities and reinforce our computation model with evaluations which show a performance gain over state of the art approaches.

1 to 3

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Institute

3 search hits