Chapter 3 of The Little Book of Semaphores proposes the following puzzle:

Generalize the signal pattern so that it works both ways. Thread A has to wait for Thread B and vice versa.

If Thread A runs two statements a1 and a2, and Thread B runs two statements b1 and b2, the requirement is for a1 to execute before b2 and for b1 to execute before a2. In other words, the threads are to meet before executing their respective second statement.

The simplest way is to use a sync.WaitGroup in both goroutines:

func worker(name string, rendezvous *sync.WaitGroup) {
	fmt.Printf("%s%d\n", name, 1)
	rendezvous.Done()
	rendezvous.Wait()
	fmt.Printf("%s%d\n", name, 2)
}

func main() {
	// ...

	rendezvous := &sync.WaitGroup{}
	rendezvous.Add(2)

	go worker("a", rendezvous)
	go worker("b", rendezvous)

	// ...
}

(full program here)

This solution scales nicely to any number of goroutines:

func worker(name string, rendezvous *sync.WaitGroup) {
	fmt.Printf("%s%d\n", name, 1)
	rendezvous.Done()
	rendezvous.Wait()
	fmt.Printf("%s%d\n", name, 2)
}

func main() {
	// ...

	N := 10

	rendezvous := &sync.WaitGroup{}
	rendezvous.Add(N)

	for i := 0; i < N; i++ {
		name := string('a' + i)
		go worker(name, rendezvous)
	}

	// ...
}

(full program here)

In contrast, a channel-based solution scales badly. To see why, consider a solution like the following for the N=2 case:

type Signal struct{}

func workerA(aReady, bReady chan Signal) {
	fmt.Println("a1")
	aReady <- Signal{}
	<-bReady
	fmt.Println("a2")
}

func workerB(aReady, bReady chan Signal) {
	fmt.Println("b1")
	<-aReady
	bReady <- Signal{}
	fmt.Println("b2")
}

func main() {
	// ...

	aReady := make(chan Signal)
	bReady := make(chan Signal)

	go workerA(aReady, bReady)
	go workerB(aReady, bReady)

	// ...
}

The first problem is the delicate balance between the sending and the receiving sides. This could be eliminated by buffering the channels:

type Signal struct{}

func worker(name string, ready, wait chan Signal) {
	fmt.Printf("%s%d\n", name, 1)
	ready <- Signal{}
	<-wait
	fmt.Printf("%s%d\n", name, 2)
}

func main() {
	// ...

	aReady := make(chan Signal, 1)
	bReady := make(chan Signal, 1)

	go worker("a", aReady, bReady)
	go worker("b", bReady, aReady)

	// ...
}

(full program here)

The second problem is scaling this beyond two goroutines. Notice how the worker sends one signal to the ready channel because there’s one goroutine waiting for that signal. In the general case, the goroutine would have to send one signal to N-1 ready channels and it would have to get N-1 signals from the wait channel (one from each ready goroutine).

This is getting ugly fast.

A better solution is to introduce a helper goroutine that would consume N signals from the ready channel and send N signals to the wait channel.

func worker(name string, ready, wait chan Signal) {
	fmt.Printf("%s%d\n", name, 1)
	ready <- Signal{}
	<-wait
	fmt.Printf("%s%d\n", name, 2)
}

func helper(N int, ready, wait chan Signal) {
	for i := N; i > 0; i-- {
		<-ready
	}
	// All the goroutines are ready, signal all the waiting
	// goroutines
	for i := N; i > 0; i-- {
		wait <- Signal{}
	}
}

func main() {
	// ...

	N := 10
	ready := make(chan Signal)
	wait := make(chan Signal)

	for i := 0; i < N; i++ {
		name := string('a' + i)
		go worker(name, ready, wait)
	}

	go helper(N, ready, wait)

	// ...
}

(full program here)

With the exception of the helper, notice how this solution is not all that different from the one based on a sync.WaitGroup, as the worker implementation is the same from a program logic point of view. The channel-based solution has one advantage going for it, though: try surrounding the body of the worker in a for loop.