A Case Study of Go Channels and Goroutines
Let's talk about Go channels and Goroutines and how we can get started with them using a semi-practical example.
In our case we will calculate all prime numbers from 2 to 500,000. The reason for picking this example is that determining a number is prime takes CPU time and it's an independent enough task that we can use threads to concurrently process multiple numbers. By using channels to distribute the work to Goroutines we get a well rounded way to learn about these concepts.
At the end, we will implement the same thing in Python. We will compare the implementation differences between Go and Python and how long it takes to run equivalent programs. Spoiler alert: Go is much better suited for this task.
Program
Here's the entire program. Let's talk about it in chunks and explore the concepts behind the lines.
package main import ( "fmt" "runtime" "sync" ) func main() { var wg sync.WaitGroup most := 500000 least := 2 nums := make(chan int, most) primes := make(chan int, most/2) // premature optimization because we won't need as big a buffer as `nums` needs for i := 1; i <= runtime.NumCPU(); i++ { wg.Add(1) go is_prime(nums, primes, &wg) } go generator(least, most, nums, primes, &wg) total := 0 for x := range primes { fmt.Println(x) total += 1 } fmt.Printf("Total primes found: %d\n", total) fmt.Println("Complete") } func generator(least int, most int, numChan chan int, primeChan chan int, wg *sync.WaitGroup) { for i := least; i <= most; i++ { numChan <- i } close(numChan) wg.Wait() close(primeChan) } func is_prime(numChan <-chan int, primeChan chan<- int, wg *sync.WaitGroup) { defer wg.Done() for num := range numChan { prime := true for i := 2; i < num; i++ { if (num % i) == 0 { prime = false break } } if prime { primeChan <- num } } }
I learned these concepts through Stack Overflow[0][1] and Go by Example.
First read about Goroutines, Channels, and WaitGroups. Then return to this post.
Design
Our high level design is:
- Create a channel (
nums
) which stores all numbers from 2 to 500,000. I chose these numbers because it takes a short enough time to run the program written in Go and it's enough time to view system metrics like CPU and memory utilization. In the Python version it takes much longer but completes in a reasonable time as well. Larger values would just make the Python case worse without any benefit. - Create a channel (
primes
) which stores all prime numbers. - Generate numbers and push them to
nums
. - Create multiple Goroutines to read from
nums
and determine if the number is prime. If it is, push toprimes
.
The biggest problem I ran into was fatal error: all goroutines are asleep - deadlock!
.
The reason is that I couldn't figure out the order in which to wait for
Goroutines to complete and close the channels. Here's what I finally learned:
- Prefer buffered channels.
- Close the channel as soon as you are done writing to it.
- We can still receive from a closed channel until it is empty.
- Use a WaitGroup to wait for all Goroutines to complete.
- Introduce a blocking receive/read from a channel after Goroutines are complete. Otherwise the program may exit prematurely.
- Perform all writes to a channel in Goroutines. I had trouble when writing in the main thread.
The above are my observations. I don't have links to official documentation or other material from any authority to back them up.
Let's review the code. We have two functions, generator
and is_prime
, which
we run in Goroutines.
is_prime
We call/start is_prime
first, in more than one Goroutines. We use the
standard library (stdlib)
runtime.NumCPU()
to
determine the number of goroutines to start. This is a handy way to
deterministically decide how many Goroutines we should use. Of course, this
number will change from my machine to yours.
Notice how we use <-
differently
for numChan
and primeChan
: numChan <-chan int, primeChan chan<- int
. This
is related to Channel Directions.
Here we will write to primeChan
(chan<-
) and read from numChan
(<-chan
). I liked how <-
indicates direction of data in or out of the
channel.
We pass in the wait group and call its Done
method to indicate that this
Goroutine is complete. When wg.Wait()
is called, it will wait for all
Goroutines to call this Done
method.
is_prime
Goroutines are blocking because there's nothing in numChan
they
can receive. They wait for generator
to push data.
generator
We call/start generator
second, in a single Goroutine. This will generate all
numbers and push/write them to numChan
. Right after the loop is done we close
the channel. The reason, as mentioned above, is I observed this is the best
thing to do. We don't need to write to this channel anymore so close it asap.
Notice this Goroutine is doing other things, too. It is waiting for other
Goroutines to complete, which in our case are is_prime
, which are in turn
waiting for this Goroutine to push data into numChan
.
After generator
pushes all the data that is_prime
are waiting for, it waits
for them to complete.
We know that only is_prime
will write to primeChan
and once they are
completed, we can close it, too.
for range primes
This is a blocking for loop, which reads the primes
channel until it is
drained. The channel was already closed in the generator
function. At this
point we know that all the work has been done, all Goroutines are completed,
and all channels are closed. We are iterating through the results and printing
them out.
Results
I ran time go run main.go
and below are the last three lines. We can see that
it took about 45 seconds to run the program. We used CPU percentage of 300+
which means we utilized more than one core at a time. I looked in Activity
Monitor while the program was executing and saw that it was using 6 threads.
Total primes found: 41538 Complete go run main.go 136.31s user 1.32s system 302% cpu 45.425 total
For comparison, an equivalent program written in Python (after these results), took a little less than 14 minutes to complete. It used 94% CPU which means it was pegged to one core. Even though we use threads in this program, Python's Global Interpreter Lock means only one thread is actually executing at a time. I looked in Activity Monitor while the program was executing and saw that it was using 3 threads.
Total primes found: 41538 Complete python3 main.py 783.81s user 9.60s system 94% cpu 13:57.30 total
My conclusion is that when we want to utilize multiple CPU cores for a problem that can more easily be split into parallel chunks of work, Go and its channels and Goroutines are a great way to write the program. Python is not that great at solving this sort of problem. I have heard that Python is "slow" and today I actually learned what it means. Not to say that Python is slow all the time; it is great in so many ways and an excellent choice, just not for these kinds of problems.
Python Version
Here's an equivalent implementation in Python. I use queue
and threading
standard libraries to act as channel and Goroutine substitutes. I did not
attempt a subprocess-based
implementation because it does not match Goroutines as closely in concept as
threads do. Also note that this is a quick-and-dirty version and wholly
unoptimized but it got the job done.
from queue import Queue import threading import time most = 500000 least = 2 nums = Queue() primes = Queue() def generator(): for i in range(least,most+1): nums.put(i) def is_prime(): while not nums.empty(): prime = True num = nums.get() for i in range(2, num): if (num % i) == 0: prime = False break if prime: primes.put(num) g = threading.Thread(target=generator) g.start() g.join() ip1 = threading.Thread(target=is_prime) ip2 = threading.Thread(target=is_prime) ip1.start() ip2.start() ip1.join() ip2.join() total = 0 while not primes.empty(): print(primes.get()) total += 1 print(f"Total primes found: {total}") print("Complete")