Golang's encoding/XML is broken and no one seems to care
For the first time I’ve felt betrayed somewhat by Go’s standard library. encoding/xml
is just broken in a lot of ways, has been for over a decade, and no one seems
to want to fix it. I couldn’t find a good independent, working xml library to use
either, so this just doesn’t work.
In short: I do not recommend you try and parse or generate XML in golang, at
least until Google fixes encoding/xml
or someone writes an XML parser that
works. With enough hacks and workarounds you just might be able to get something
to work. It won’t be pretty though.
I mean seriously, you can’t really parse or generate RSS or podcast feeds with
encoding/xml
. I feel like that should be one of the better supported cases.
Takeaways:
-
A lot of the time the advice will boil down to having to have separate structs with different tags for marshalling and unmarshalling XML.
As much as I hate it, that really does seem to be the only solution for some of these problems for now.
-
XML namespaces are barely supported and buggy. You will not be able to choose the prefix for your namespace in your generated output, so you won’t be able to generate exactly the XML you want.
You will then discover that many XML parsers in the wild (cough podcasts) expect to have certain namespace prefixes, and will choke on go’s technically valid output.
Quick Recap
Here’s an example of using encoding/xml
package main
import (
"encoding/xml"
"fmt"
)
type XMLDoc struct {
XMLName xml.Name `xml:"document"`
Field string `xml:"field"`
}
func main() {
input := `<document><field>field_value</field></document>`
var parsed XMLDoc
xml.Unmarshal([]byte(input), &parsed)
fmt.Printf("%#v", parsed)
}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, Field:"field_value"}
Simple enough. Here’s how you would modify this to use namespaces, demonstrating two equivalent ways to specify the namespace.
package main
import (
"encoding/xml"
"fmt"
)
type XMLDoc struct {
XMLName xml.Name `xml:"document"`
Field string `xml:"https://example.com/xmlschema field"`
}
func parse(input string) {
var parsed XMLDoc
xml.Unmarshal([]byte(input), &parsed)
fmt.Printf("%#v\n", parsed)
}
func main() {
parse(`
<document xmlns:examplenamespace="https://example.com/xmlschema">
<examplenamespace:field>field_value</examplenamespace:field>
</document>
`)
parse(`
<document>
<field xmlns="https://example.com/xmlschema">field_value</field>
</document>
`)
}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, Field:"field_value"}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, Field:"field_value"}
The Problem
The issue is when you need to mix namespaced and non-namespaced tags.
Watch this magic trick, this works as expected:
package main
import (
"encoding/xml"
"fmt"
)
type XMLDoc struct {
XMLName xml.Name `xml:"document"`
NamespacedField string `xml:"https://example.com/xmlschema field"`
Field string `xml:"field"`
}
func parse(input string) XMLDoc {
var parsed XMLDoc
xml.Unmarshal([]byte(input), &parsed)
fmt.Printf("\n%#v\n", parsed)
return parsed
}
func main() {
parse(`
<document xmlns:examplenamespace="https://example.com/xmlschema">
<examplenamespace:field>namespaced_field_value</examplenamespace:field>
<field>field_value</field>
</document>
`)
parse(`
<document>
<field xmlns="https://example.com/xmlschema">namespaced_field_value</field>
<field>field_value</field>
</document>
`)
}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, NamespacedField:"namespaced_field_value", Field:"field_value"}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, NamespacedField:"namespaced_field_value", Field:"field_value"}
All good, but watch what happens when I flip the order of Field
and NamespacedField
in
the struct:
type XMLDoc struct {
XMLName xml.Name `xml:"document"`
Field string `xml:"field"`
NamespacedField string `xml:"https://example.com/xmlschema field"`
}
Our test program output is now:
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, Field:"field_value", NamespacedField:""}
main.XMLDoc{XMLName:xml.Name{Space:"", Local:"document"}, Field:"field_value", NamespacedField:""}
Yikes! Now NamespacedField
isn’t correctly deserialized!
We’ve found an 11 year old bug in
encoding/xml
that Google is unable or unwilling to fix. It was really not fun
to discover this independently and find these ancient issues open.
Workaround 1: Fake Namespaces
One workaround is to a fake namespace and set it as the default namespace, eg
package main
import (
"bytes"
"encoding/xml"
"fmt"
)
type XMLDoc struct {
XMLName xml.Name `xml:"document"`
Field string `xml:"_ field"`
NamespacedField string `xml:"https://example.com/xmlschema field"`
}
func parse(input string) XMLDoc {
var parsed XMLDoc
decoder := xml.NewDecoder(bytes.NewReader([]byte(input)))
decoder.DefaultSpace = "_"
decoder.Decode(&parsed)
fmt.Printf("\n%#v\n", parsed)
return parsed
}
func write(doc XMLDoc) {
marshalled, _ := xml.Marshal(&doc)
fmt.Print(string(marshalled))
}
func main() {
write(parse(`
<document xmlns:examplenamespace="https://example.com/xmlschema">
<examplenamespace:field>namespaced_field_value</examplenamespace:field>
<field>field_value</field>
</document>
`))
}
main.XMLDoc{XMLName:xml.Name{Space:"_", Local:"document"}, Field:"field_value", NamespacedField:"namespaced_field_value"}
<document><field xmlns="_">field_value</field><field xmlns="https://example.com/xmlschema">namespaced_field_value</field></document>
This is gross and I hate it, but works for decoding. You can see the problem in the output though: our fake namespace now leaks into the serialized XML.
Absent some gross (and insecure) string find/replacing action, the only way around this seems to be to have a separate model without the fake namespace for marshalling. Yikes.
Workaround 2: Custom types (!!)
This is from this comment on github.