A Case Study of Go Channels and Goroutines

Let's talk about Go channels and Goroutines and how we can get started with them using a semi-practical example.

In our case we will calculate all prime numbers from 2 to 500,000. The reason for picking this example is that determining a number is prime takes CPU time and it's an independent enough task that we can use threads to concurrently process multiple numbers. By using channels to distribute the work to Goroutines we get a well rounded way to learn about these concepts.

At the end, we will implement the same thing in Python. We will compare the implementation differences between Go and Python and how long it takes to run equivalent programs. Spoiler alert: Go is much better suited for this task.

Program

Here's the entire program. Let's talk about it in chunks and explore the concepts behind the lines.

package main

import (
    "fmt"
    "runtime"
    "sync"
)

func main() {
    var wg sync.WaitGroup

    most := 500000
    least := 2

    nums := make(chan int, most)
    primes := make(chan int, most/2)  // premature optimization because we won't need as big a buffer as `nums` needs

    for i := 1; i <= runtime.NumCPU(); i++ {
        wg.Add(1)
        go is_prime(nums, primes, &wg)
    }

    go generator(least, most, nums, primes, &wg)

    total := 0
    for x := range primes {
        fmt.Println(x)
        total += 1
    }
    fmt.Printf("Total primes found: %d\n", total)
    fmt.Println("Complete")
}

func generator(least int, most int, numChan chan int, primeChan chan int, wg *sync.WaitGroup) {
    for i := least; i <= most; i++ {
        numChan <- i
    }
    close(numChan)
    wg.Wait()
    close(primeChan)
}

func is_prime(numChan <-chan int, primeChan chan<- int, wg *sync.WaitGroup) {
    defer wg.Done()
    for num := range numChan {
        prime := true
        for i := 2; i < num; i++ {
            if (num % i) == 0 {
                prime = false
                break
            }
        }
        if prime {
            primeChan <- num
        }
    }
}

I learned these concepts through Stack Overflow[0][1] and Go by Example.

First read about Goroutines, Channels, and WaitGroups. Then return to this post.

Design

Our high level design is:

  1. Create a channel (nums) which stores all numbers from 2 to 500,000. I chose these numbers because it takes a short enough time to run the program written in Go and it's enough time to view system metrics like CPU and memory utilization. In the Python version it takes much longer but completes in a reasonable time as well. Larger values would just make the Python case worse without any benefit.
  2. Create a channel (primes) which stores all prime numbers.
  3. Generate numbers and push them to nums.
  4. Create multiple Goroutines to read from nums and determine if the number is prime. If it is, push to primes.

The biggest problem I ran into was fatal error: all goroutines are asleep - deadlock!. The reason is that I couldn't figure out the order in which to wait for Goroutines to complete and close the channels. Here's what I finally learned:

  1. Prefer buffered channels.
  2. Close the channel as soon as you are done writing to it.
  3. We can still receive from a closed channel until it is empty.
  4. Use a WaitGroup to wait for all Goroutines to complete.
  5. Introduce a blocking receive/read from a channel after Goroutines are complete. Otherwise the program may exit prematurely.
  6. Perform all writes to a channel in Goroutines. I had trouble when writing in the main thread.

The above are my observations. I don't have links to official documentation or other material from any authority to back them up.

Let's review the code. We have two functions, generator and is_prime, which we run in Goroutines.

is_prime

We call/start is_prime first, in more than one Goroutines. We use the standard library (stdlib) runtime.NumCPU() to determine the number of goroutines to start. This is a handy way to deterministically decide how many Goroutines we should use. Of course, this number will change from my machine to yours.

Notice how we use <- differently for numChan and primeChan: numChan <-chan int, primeChan chan<- int. This is related to Channel Directions. Here we will write to primeChan (chan<-) and read from numChan (<-chan). I liked how <- indicates direction of data in or out of the channel.

We pass in the wait group and call its Done method to indicate that this Goroutine is complete. When wg.Wait() is called, it will wait for all Goroutines to call this Done method.

is_prime Goroutines are blocking because there's nothing in numChan they can receive. They wait for generator to push data.

generator

We call/start generator second, in a single Goroutine. This will generate all numbers and push/write them to numChan. Right after the loop is done we close the channel. The reason, as mentioned above, is I observed this is the best thing to do. We don't need to write to this channel anymore so close it asap.

Notice this Goroutine is doing other things, too. It is waiting for other Goroutines to complete, which in our case are is_prime, which are in turn waiting for this Goroutine to push data into numChan.

After generator pushes all the data that is_prime are waiting for, it waits for them to complete.

We know that only is_prime will write to primeChan and once they are completed, we can close it, too.

for range primes

This is a blocking for loop, which reads the primes channel until it is drained. The channel was already closed in the generator function. At this point we know that all the work has been done, all Goroutines are completed, and all channels are closed. We are iterating through the results and printing them out.

Results

I ran time go run main.go and below are the last three lines. We can see that it took about 45 seconds to run the program. We used CPU percentage of 300+ which means we utilized more than one core at a time. I looked in Activity Monitor while the program was executing and saw that it was using 6 threads.

Total primes found: 41538
Complete
go run main.go  136.31s user 1.32s system 302% cpu 45.425 total

For comparison, an equivalent program written in Python (after these results), took a little less than 14 minutes to complete. It used 94% CPU which means it was pegged to one core. Even though we use threads in this program, Python's Global Interpreter Lock means only one thread is actually executing at a time. I looked in Activity Monitor while the program was executing and saw that it was using 3 threads.

Total primes found: 41538
Complete
python3 main.py  783.81s user 9.60s system 94% cpu 13:57.30 total

My conclusion is that when we want to utilize multiple CPU cores for a problem that can more easily be split into parallel chunks of work, Go and its channels and Goroutines are a great way to write the program. Python is not that great at solving this sort of problem. I have heard that Python is "slow" and today I actually learned what it means. Not to say that Python is slow all the time; it is great in so many ways and an excellent choice, just not for these kinds of problems.

Python Version

Here's an equivalent implementation in Python. I use queue and threading standard libraries to act as channel and Goroutine substitutes. I did not attempt a subprocess-based implementation because it does not match Goroutines as closely in concept as threads do. Also note that this is a quick-and-dirty version and wholly unoptimized but it got the job done.

from queue import Queue
import threading
import time

most = 500000
least = 2
nums = Queue()
primes = Queue()

def generator():
    for i in range(least,most+1):
        nums.put(i)

def is_prime():
    while not nums.empty():
        prime = True
        num = nums.get()
        for i in range(2, num):
            if (num % i) == 0:
                prime = False
                break
        if prime:
            primes.put(num)

g = threading.Thread(target=generator)
g.start()
g.join()

ip1 = threading.Thread(target=is_prime)
ip2 = threading.Thread(target=is_prime)
ip1.start()
ip2.start()
ip1.join()
ip2.join()

total = 0
while not primes.empty():
    print(primes.get())
    total += 1

print(f"Total primes found: {total}")
print("Complete")

Resources

More Reading