michf at post.tau.ac.il
Wed Sep 1 11:17:15 EDT 2010
On 01/09/2010 15:18, Fumie Costen wrote:
> Dear All, I believe some of you have come across the phrase "GPU
> I have an access to GPU cluster remotely but the speed of data transfer
> between the GPUs seems to be pretty slow from the specification and I
> feel this
> is going to be
> the serious bottle neck of the large scale computation with GPUs.
> Even when we do tackle this particular problem
> using the overlap of communication and computation somehow
> spending some significant amount of time,
> by the time when we try to publish even a conference paper,
> the new-spec GPU cluster could be available and all of our
> effort would be wasted and can not lead to the publication.
> Furthermore, just porting our current code to GPU won't give
> us journal papers, I guess.
> I would be grateful if there are anybody who
> have any experience in GPU computation and can share
> the experience with me.
> I am in the middle of production of the next PhD topics
> and the topic has to be productive from the perspective of
> journal publication.
GPUs have to communicate to CPU memory over PCIe which is a bottleneck indeed.
You have 2 modes of operations, standard and pinned memory (which allows DMA).
On a core-2 my experience is that max upload download speed is around 2-3 GB/s
(depending on mode) for large data sets (you reach top speed around 2 MB or s).
For small buffers it can go as low as 0.3GB/s.
I know of people who reached around 5.5GB/s but I believe that is on a Nehalem
machine with pinned memory.
You can do concurrent copy and execute to hide copy time, but you need a long
enough running kernel for that to work.
Another thing to note is that starting with Cuda 3.1 NVIDIA did some work to
allow direct transfer of data to infiniband which can reduce the CPU memory
As for research, I don't have enough experience with HPC papers, but you should
always remember to compare results in the papers for comparable hardware.
As for GPU related papers, you need to remember that this is a different
architecture with different assumptions, so in terms of papers, possible
research areas are:
1. Mapping existing algorithms to GPU architectures
2. Scalability of GPU architecture. due to the extra redirection with GPUs and
opportunities for concurrent copy and execute it has an extra level of
challenge. You can always compare standard approach to adaptive approaches.
3. Development of new techniques that are more appropriate to GPUs.
4. Hybrid algorithms that make good use of both the GPU and the CPU
> Thank you very much,
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf