Vincent's Blog

Posts List

An introduction to CUDA in Python (Part 5)

@Vincent Lunot · Dec 10, 2017

In Part 4 of this introduction, we saw that the performance of our convolution kernel is limited by memory bandwidth. We are going to see how to improve performance by using shared memory.

An introduction to CUDA in Python (Part 4)

@Vincent Lunot · Dec 4, 2017

In this part, we will learn how to profile a CUDA kernel using both nvprof and nvvp, the Visual Profiler. We will use the convolution kernel from Part 3, and discover thanks to profiling how to improve it.