This is a long overdue follow up to my glibc’s dynamic linker/loader post from last year. With the Go 1.8 release around the corner, which adds support for dynamic plugins, this is a good time to revisit the topic.
Plugins in Go are described by Ian Lance Taylor in the “Go Execution Modes” design document. They are part of a larger effort to support dynamic shared objects in Go. As Ian points out, Go 1.4 supported three execution modes:
- A statically linked Go binary
- A dynamically linked Go binary, linked with the C library for DNS and user name lookups
- A Go binary linked with arbitrary non-Go code, either statically or dynamically
Simplifying a little bit, if you have never cared about cgo, you’ve been likely working with the first or second modes, without you being aware of it. Go 1.5 added support for an alternative DNS resolver that doesn’t require cgo, meaning the second mode is not as tightly coupled with DNS as it used to be (if you do want to use the C library’s resolver, you still need it). Cgo support is required for the second mode, but if your code doesn’t use packages that depend on cgo, you might have cgo support in the compiler and still have binaries working in the first mode. It’s even possible to force the use of the Go DNS resolver via build tags.
Go 1.5 also added the following modes:
- Go code linked into, and called from, a non-Go program
- Go code as a shared library plugin with a C style API
- Building Go packages as a shared library
The first one is your Go code as a static library that you can link into another program and call it with a C style API. The second one is your Go code acting as a shared object loaded at runtime by another program. And the third one is your Go code built as a shared object loaded by the dynamic linker at the program’s load time. The last two are very similar, but not exactly the same thing.
Go 1.6 added another mode:
- A Go program built as a PIE
In this mode your Go code is built in such a way that it’s position independent, meaning it can be loaded at any memory address and it will work, which is something that security-concious applications care about.
There’s still one mode listed in the design document missing in 1.7:
- Go code that uses a shared library plugin
This is the new mode implemented in Go 1.8: you can write a plugin in Go, and you can load that plugin from your Go program. In order to support this the plugin package was added to the standard library.
Plugins
Since it’s the new shiny feature, let’s take a look at plugins first.
This program snippet is mostly what’s required to work with a plugin:
// Load the DSO specified by a filename.
p, err := plugin.Open(fn)
if err != nil {
fmt.Printf("plugin.Open: %s\n", err)
return
}
// Lookup a symbol named "Hello". If we get something, we don't know
what we got (a variable or function).
h, err := p.Lookup("Hello")
if err != nil {
fmt.Printf("p.Lookup: %s\n", err)
return
}
// Msger is an interface that specifies that a func Msg() string must be
// implemented. Type-assert that interface to verify that the symbol
obtained above is of the correct kind.
m, ok := h.(Msger)
if !ok {
fmt.Println("E: Expecting Msger interface, but got something else.")
return
}
// We have what we want. Use it.
fmt.Printf("%s: %s\n", fn, m.Msg())
You can compile the above code in the usual way, e.g. go build
demo.go
. If you do this you’ll notice something interesting:
$ go build demo.go
$ file demo
demo: [...], dynamically linked, [...]
$ readelf -d demo | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
This program needs to import "plugin"
, and that’s what causes it to be
dynamically linked. libdl.so.2
is a C library that provides the
necessary functionality to load code at runtime.
What about the plugin itself? In the example above, the program expects the loaded plugin to have a symbol named “Hello” and that symbol should be a variable implementing a specific interface. The information required to make this determination is stored in the plugin.
One possible such plugin might look like this:
package main
import "C"
type EnglishMsger struct{}
func (EnglishMsger) Msg() string {
return "hello, world! (from plugin)"
}
var Hello EnglishMsger
Another one is this:
package main
import "C"
type SpanishMsger struct{}
func (SpanishMsger) Msg() string {
return "¡hola, mundo! (desde un plugin)"
}
var Hello SpanishMsger
Either one will work, and which one is loaded is determined at runtime.
Note that the only thing that these plugins have in common is that both
have a global variable named Hello
. The types of the variables are
different, but they satisfy the same interface. You could have used a
function, and the only thing different would have been that you would
need to type-assert to a different type.
These plugins can be compiled like this:
$ go build -buildmode=plugin english.go
$ go build -buildmode=plugin spanish.go
This will produce files called english.so
and spanish.so
respectively, but you can use -o
and name your plugin whatever you
want (if english.plugin
is what you want, that works, the file doesn’t
even need to be named .so
).
With the program shown above, you can use these plugins like this:
$ ./demo english.so
hello, world! (from plugin)
$ ./demo spanish.so
¡hola, mundo! (desde un plugin)
Needless to say, you can implement much more than a multilingual hello world.
I’d like to emphasize a point here: you can lookup exported functions or variables, and you must type-assert them before being able to use them. If your symbol is a function, you can use it like any other function value. If your symbol is a variable, you can use it like you would use any other instance of the corresponding type: you can read the value, you can call the methods defined for that type, and if satisfies an interface, you can use it anywhere where the interface is valid.
In short, to use a plugin, all you have to do is:
- Load the plugin
- Lookup a symbol by name
- Type-assert the symbol to the type that you expect
- Use the loaded code
Not your regular dlopen
If you are familiar with how the equivalent C code would work, you might have noticed a couple of things.
First, I’m using “english.so” as the argument to the demo program, and
that gets passed to plugin.Open
. With your regular dlopen
usage, if
you pass a path without a slash in it, it will apply certain lookup
rules to locate the
file. plugin.Open
is not dlopen
and it won’t apply those rules.
Instead it will take whatever path you pass to it and canonicalize it:
it will remove any .
and ..
it might contain, traverse any symlinks
and make it absolute. So the example above behaves as if I had used
$PWD/add_plugin.so
instead.
Second, the program is looking up Hello
. With a Go programmer’s
mindset this might look normal, as Hello
is in fact the name of the
variable, but… is it? It’s defined in package main
, so its name
would normally be main.Hello
(even if you cannot actually use that
name). If you build the plugins yourself, you’ll find that that’s not
the name recorded in the binaries. The plugin
package performs some
mapping between the names as recorded in the binaries, and the names of
the symbols as requested by the loading program. This is why you can say
that you want "Hello"
as shown in the code.
Examining the plugin
package, you’ll notice an interesting
restriction: if both your program and the plugin use package foo
, the
foo
packages used to compile the program and the plugin must be
identical, meaning at runtime the hashes for package foo
stored in
both binaries will be compared and an error will be produced if they
don’t match. This will become relevant in a minute.
Shared libraries
How about building a dynamically linked Go program with dynamically linked Go code? It should be dead easy, right?
$ go install -buildmode=shared github.com/mem/dso/lib
The output of that command is this:
multiple roots ${GOPATH}/pkg/linux_amd64_dynlink & ${GOROOT}/pkg/linux_amd64_dynlink
This is the compiler’s way of saying that it’s trying to install packages to two different locations.
What this really means is that since I’ve never built the standard
library with -buildmode=shared
, the compiler is trying to do it as
part of the command shown above, and since my toy lib lives in GOPATH
and the standard library lives in GOROOT, it’s trying to install
packages to two different locations.
Second try:
$ go install -buildmode=shared std
$ go install -buildmode=shared github.com/mem/dso/lib
multiple roots ${GOPATH}/pkg/linux_amd64_dynlink & ${GOROOT}/pkg/linux_amd64_dynlink
Same error, but now it’s trying to say something different (for the same
reason): in order to link dynamically against the standard library (and
whatever other thing your package uses), you have to pass the
-linkshared
flag.
Third time:
$ go install -buildmode=shared std
$ go install -buildmode=shared -linkshared github.com/mem/dso/lib
Yay!
What happening here is that I’m trying to build a shared library
(-buildmode=shared
) and that library links to other shared libraries
(the standard library), therefore it’s necessary to say that you really
want to do that (-linkshared
).
The last command will create a file called
libgithub.com-mem-dso-lib.so
in the $GOPATH/pkg/linux_amd64_dynlink
directory, along with two other files (lib.a
and lib.shlibname
) in
the $GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso
directory.
If you follow this steps, you are now ready to use the shiny new library:
$ go install -linkshared github.com/mem/dso/cmd/demo-lib
This creates a file $GOPATH/bin/demo-lib
. Upon closer inspection:
$ readelf -d $GOPATH/bin/demo-lib | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstd.so]
0x0000000000000001 (NEEDED) Shared library: [libgithub.com-mem-dso-lib.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d $GOPATH/pkg/linux_amd64_dynlink/libgithub.com-mem-dso-lib.so |
grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstd.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
The demo-lib
binary links not one but two Go shared libraries. If you
run the program it will work just fine. These shared libraries are
located in non-standard locations ($GOROOT/pkg/...
and
$GOPATH/pkg/...
), and the program works just fine because the compiler
added an RPATH entry in the dynamic section of the executable listing
those two directories where the libraries are installed.
As hinted above, libstd.so
is the Go standard library. This is a very
unfortunate choice for a name, as it is exceedingly generic – imagine
if every language called their standard library libstd.so instead of
libc.so, libstdc++.so, etc. It’s also unfortunate that the name does not
include some kind of version related to the Go version used to build the
binary (for example libstd-1.7.so
, libstd-1.8.so
, etc). I have not
checked, but I doubt that programs built with Go 1.7 will work fine with
Go 1.8’s libstd.so (ignoring other issues that possibly exist as well).
What’s in a SONAME?
Examining the other library (libgithub.com-mem-dso-lib.so
), you’ll
find that it does not have a SONAME
(see previous post). You can add
one if you need to, for example, if you need to have multiple
incompatible versions of the same package installed as shared libraries:
$ go install \
-ldflags '-extldflags -Wl,-soname,libgithub.com-mem-dso-lib.so.0' \
-buildmode=shared \
-linkshared \
github.com/mem/dso/lib
Note that the output filename will still be
libgithub.com-mem-dso-lib.so
. This creates a problem. If you relink
your executable and try to run it, you’ll see:
$ bin/demo-lib
bin/demo-lib: error while loading shared libraries: libgithub.com-mem-dso-lib.so.0: cannot open shared object file: No such file or directory
What’s happening here is that the dynamic linker is looking for a file named
libgithub.com-mem-dso-lib.so.0
and it’s not there. What you need to do
is rename the library, and install a symlink from
libgithub.com-mem-dso-lib.so
to libgithub.com-mem-dso-lib.so.0
(again, see the previous post as to why).
It remains to be seen if this will be a problem for Go, as there’s no single strategy around versioning packages. There are two large options for expressing version numbers:
Some people use versions embedded in import paths, and, as shown above, this would be reflected in the filename for the shared library, meaning this would have a chance of working.
Other people create branches in the repositories hosting the packages (for example
v1
,v2
, etc) and use vendoring tools to keep the packages pinned to the connect branch. This would not be reflected in the filenames. People who have to support multiple installed versions of the same package would have to devise their own solution to the problem. One such group of people are Linux distributions, like Debian and Fedora. Debian has shown in practice this to be a very painful and problematic path to walk.
Note that both options are orthogonal to semantic versioning, which has some traction with the community. It refers to how to pick up the version number, but not to how to express that number in a package: a different import path, a different branch, or something else. The new tool dep has shown a preference (US17) for the second strategy, but so far it’s just a preference.
In either case, without a versioning strategy, it’s not clear how to translate from:
import "github.com/mem/dso/lib"
to a filename like libgithub.com-mem-dso-lib.so.3.1.0
and from there
to a SONAME.
Libraries using libraries
Once you have shared libraries, there’s an expectation that your shared libraries can link to other shared libraries. For example, it is almost certain that you’ll have libpng installed in your system. If you examine it, you’ll find something like this:
$ readelf -d /usr/lib/x86_64-linux-gnu/libpng16.so.16 | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
This says that the libpng16.so.16
links dynamically against the
libz.so.1
, libm.so.6
and libc.so.6
libraries. When the library was
built, there was a step in the process that read something similar to
this:
$ gcc -o libpng16.so.16.26.0 {other parameters} -lz -lm
In most common cases, this says “leave a note (in the form of NEEDED
above) in the ELF file that instructs the dynamic linker to look for
libz.so.1 at runtime”.
What does this look like in Go?
Consider this structure for discussion purposes:
github.com/mem/dso/cmd/demo-outer-inner
github.com/mem/dso/outer
github.com/mem/dso/outer/inner
demo-outer-inner
is a main
package that imports
github.com/mem/dso/outer
, which in turn imports
github.com/mem/dso/outer/inner
.
If your $GOPATH/pkg
directory is empty, and you compile
demo-outer-inner
, the result looks like this:
$ rm -rf $GOPATH/pkg $GOPATH/bin
$ go install -linkshared github.com/mem/dso/cmd/demo-outer-inner
$ readelf -d $GOPATH/bin/demo-outer-inner | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstd.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
The program is linked against the Go standard library, the C standard
library and nothing else. In particular, to compiler didn’t decide to
build a shared library out of the outer
or inner
packages and link
those to the resulting program.
What if the shared libraries are there before building
demo-outer-inner
?
Let’s try that:
$ rm -rf $GOPATH/pkg $GOPATH/bin
$ go install -buildmode=shared -linkshared github.com/mem/dso/outer
$ go install -linkshared github.com/mem/dso/cmd/demo-outer-inner
$ readelf -d $GOPATH/bin/demo-outer-inner | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstd.so]
0x0000000000000001 (NEEDED) Shared library: [libgithub.com-mem-dso-outer.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Ok, that’s interesting. If the library is already there, with the same
command line, the compiler does decide to use it instead of embedding it
in the executable as before. But not that inner
still didn’t magically
show up.
Let’s see…
$ rm -rf $GOPATH/pkg $GOPATH/bin
$ go install -buildmode=shared -linkshared github.com/mem/dso/outer/inner
$ go install -buildmode=shared -linkshared github.com/mem/dso/outer
$ go install -linkshared github.com/mem/dso/cmd/demo-outer-inner
$ readelf -d $GOPATH/bin/demo-outer-inner | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libstd.so]
0x0000000000000001 (NEEDED) Shared library: [libgithub.com-mem-dso-outer.so]
0x0000000000000001 (NEEDED) Shared library: [libgithub.com-mem-dso-outer-inner.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
I have to confess that this is a little annoying. I understand how this
happens, but I still find it annoying. How annoying? If you first build
the inner package as a shared library, but not the outer one, then when
you build the program, you’ll see that it embeds outer
in the binary
and dynamically links to inner
. This has nothing to do with the
directory structure of the packages, that’s just there to make a point.
You’ll get the same results if they were laid out side by side, with
outer
importing inner
.
This is not what happens if you build the packages in the default mode. If you try this:
$ go install github.com/mem/dso/outer
$ find $GOPATH/pkg -type f
$GOPATH/pkg/linux_amd64/github.com/mem/dso/outer.a
$GOPATH/pkg/linux_amd64/github.com/mem/dso/outer/inner.a
Notice how you get two archives: one for outer
and one for inner
.
Same thing happens if you build the program without building the other
packages first.
To make matters more confusing, the above also happens if you use
-linkshared
. What I mean is this:
$ rm -rf $GOPATH/pkg $GOPATH/bin
$ go install -linkshared github.com/mem/dso/cmd/demo-outer-inner
$ find $GOPATH/pkg $GOPATH/bin -type f
$GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer.a
$GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer/inner.a
$GOPATH/bin/demo-outer-inner
Notice the two archives under $(GOOS)_$(GOARCH)_dynlink
, matching the
two archives under $(GOOS)_$(GOARCH)
in the other case. These are
built in the same way as they would if you asked for shared mode, with
the exception that a shared library is not produced. It’s possible to
have shared and non-shared versions of the same library (just as you can
have static and dynamic libraries for C code).
The point here is that build order matters. If you intention is to have shared libraries for your packages, you must build them before anything that uses them, including other packages that you want to install as libraries. If you are going to be building shared libraries for Go packages, you must be very intentional about it.
Revisiting plugins
Let’s take another look at those shared libraries:
$ rm -rf $GOPATH/pkg $GOPATH/bin
$ go install -buildmode=shared -linkshared github.com/mem/dso/outer/inner
$ go install -buildmode=shared -linkshared github.com/mem/dso/outer
$ go install -linkshared github.com/mem/dso/cmd/demo-outer-inner
$ find $GOPATH/bin $GOPATH/pkg -type f -print0 | xargs -r0 ls -s
20 $GOPATH/bin/demo-outer-inner
4 $GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer.a
4 $GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer/inner.a
4 $GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer/inner.shlibname
4 $GOPATH/pkg/linux_amd64_dynlink/github.com/mem/dso/outer.shlibname
16 $GOPATH/pkg/linux_amd64_dynlink/libgithub.com-mem-dso-outer-inner.so
20 $GOPATH/pkg/linux_amd64_dynlink/libgithub.com-mem-dso-outer.so
the numbers are the size of the files in kB. Those are pretty small for Go binaries! The reason is that they don’t carry with them the standard library:
$ ls -sh $GOROOT/pkg/linux_amd64_dynlink/libstd.so
38M $GOROOT/pkg/linux_amd64_dynlink/libstd.so
What about plugins?
$ ls -sh github.com-mem-dso-plugin-english.plugin
1.7M github.com-mem-dso-plugin-english.plugin
That’s … big. Examining the file, it looks like it has quite a bit of
the runtime package in it. It’s notable that it does not list
libstd.so
in its dynamic dependencies. Isn’t this the kind of thing
that -linkshared
“fixes”? Let’s see:
$ go build -buildmode=plugin -linkshared english.go
# command-line-arguments
runtime.islibrary: missing Go type information for global symbol: size 1
I haven’t been able to figure out what this error actually means, and I haven’t been able to figure out what the logic in the compiler that leads to this error is trying to do or what it’s trying to guard against.
There’s another gotcha. Remember that I said that if your program and
your plugin use the same package, they must use identical versions? If
you recompile the little demo program using -linkshared
, you’ll get
this:
$ go build -buildmode=plugin -o spanish.so github.com/mem/dso/plugin/spanish
$ go install -linkshared github.com/mem/dso/cmd/demo-plugin
$ demo-plugin spanish.so
plugin.Open: plugin.Open: plugin was built with a different version of package runtime/cgo
What the runtime is saying here is that the version of runtime/cgo in
your program is different from the one in your plugin, and the reason
for that is that they were compiled differently (with and without
-linkshared
). As far as I’ve been able to find out, there’s no
solution for this.
Conclusion
When I first started using Go the first thing that bothered me a little wasn’t the size of the binaries, but the fact that it didn’t support dynamic linking. I understand that in some contexts the fact that Go uses statically linked binaries is a huge advantage. But there are situations where that isn’t as good, for example with a distribution like Debian, where in order to fix a security issue it becomes necessary to recompile all the packages that might contain a copy of the vulnerable code. Another one is embedded systems, where size matters. If you have a single Go binary, not having shared libraries is not an issue, but the moment you start to have more than a handful, things pile up quite quickly. In times when 128 MB of storage might seem small, maybe even too small, this might sound like a non-issue, but consider that you can have a working Linux image for a Raspberry Pi in less than 8 MB. With that in mind, having the possibility of building Go programs in a way that binary code can be shared is good. Having the possibility to load more code at runtime is even better.