- Unmarshal HTML nested in XML
- 3 Answers 3
- Parsing HTML files with Go
- 1 Answer 1
- Related
- Hot Network Questions
- Subscribe to RSS
- Go API to convert HTML to XML
- Convert a HTML file to XML in Go
- Code example in Go using REST API to convert HTML to XML format
- How to use Go API to convert HTML to XML
- xslt
- Details
- Repository
- Links
- README ¶
- go-xslt
- Description
- Installation
- Usage
Unmarshal HTML nested in XML
I am receiving an xml file from a 3rd party which has an HTML element within one of the XML tags. I cannot work out how to unmarshal this to get the href URL. XML Example:
type Href struct < Link string `xml:"href"` >type Link struct < URL []Href `xml:"a"` >type XmlFile struct < HTMLTag []Link `xml:"SOME_HTML"` >myFile := []byte(`
Welcome @asnow!.
The example is pretty much exactly what I am receiving, only difference is some other fields around it and the tag is actually
3 Answers 3
type aElement struct < Href string `xml:"href,attr"` >type content struct < A aElement `xml:"a"` >func main() < test := `google ` var result content if err := xml.Unmarshal([]byte(test), &result); err != nil < log.Fatal(err) >fmt.Println(result) >
Parsing everything in the xml, assuming also there could be more than one a tag in the html or other tags (like a div ).
If this is not needed, just replace XmlFile.Links with XmlFile.Link of type Link (not []Link )
func main() < type Link struct < XMLName xml.Name `xml:"a"` URL string `xml:"href,attr"` Target string `xml:"target,attr"` Content string `xml:",chardata"` >type Div struct < XMLName xml.Name `xml:"div"` Classes string `xml:"class,attr"` Content string `xml:",chardata"` >type XmlFile struct < XMLName xml.Name `xml:"SOME_HTML"` Links []Link `xml:"a"` Divs []Div `xml:"div"` >myFile := []byte(` google facebook fmt.Println(output) >
Edit: Added more tags in the xml to show how to parse different tag types.
You can parse the example you posted using a regular XML parser, there are however a whole lot of exceptions to the XML syntax which are commonly accepted as valid HTML.
The simplest example I can think of is : all html interpreters I know of understand that
(an unclosed
tag) is the same as a self closing
tag.
If you don’t know how the HTML on the other end of the service is generated, you are better off using an HTML parser.
For example, there is the golang.go/x/net/html package which provides several functions to parse HTML :
func findFirstHref(n *html.Node, indent string) string < if n.Type == html.ElementNode < fmt.Println(" * scanning:" + indent + n.Data) >if n.Type == html.ElementNode && n.Data == "a" < for _, a := range n.Attr < if a.Key == "href" < return a.Val >> > for c := n.FirstChild; c != nil; c = c.NextSibling < href := findFirstHref(c, indent+" ") if href != "" < return href >> return "" > func main() < doc1, err := html.Parse(strings.NewReader(sample1)) if err != nil < fmt.Println(err) >else < fmt.Println("href in sample1:", findFirstHref(doc1, "")) >doc2, err := html.Parse(strings.NewReader(sample2)) if err != nil < fmt.Println(err) >else < fmt.Println("href in sample2:", findFirstHref(doc2, "")) >> const ( sample1 = ` google ` // sample2 is an invalid XML document (it has unclosed "
" tags): sample2 = ` line1
line2 Some
text
` )
Parsing HTML files with Go
Is encoding/xml the best library to parse HTML table files like this one and exist some examples how to do it?
Test 1 Type Region Type Count Percent T1 34,314 31.648% T2 25,820 23.814% T3 4,871 4.493%
Type Count Percent T4 34,314 31.648% T5 11,187 10.318% T6 25,820 23.814%
Have you tried godoc.org/code.google.com/p/go.net/html — first Google result for «golang html parser» 😉
1 Answer 1
Strictly speaking, the only one kind of HTML which is guaranteed to be parsed by a conforming XML parser is XHTML, but despite the fact XHTML once has been thought of as coming to be the HTML standard, it has not really taken off the ground and these days it’s considered obsolete (in favor of the much hyped «HTML5» thing and all the ecosystem around it). The basic problem with HTML is that while it looks like XML it has different rules. One glaring distinction is that
is a perfectly legal HTML but is an unterminated element in XML (in the latter, it has to be spelled
), and there are a lot more differences.
On the other hand, your particular example looks quite XML’ish to me, so if you can guarantee your data, while being HTML, will always be a well-formed XML at the same time, you can just use the encoding/xml package. Otherwise go for go.net/html , as suggested by @elithrar, or find some other package.
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.24.43543
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
Go API to convert HTML to XML
Use Cells Conversion REST API to create customized spreadsheet workflows in Go. This is a professional solution to convert HTML to XML and other document formats online using Go.
Convert a HTML file to XML in Go
Converting file formats from HTML to XML is a complex task. All HTML to XML format transitions is performed by our Go SDK while maintaining the source HTML spreadsheet’s main structural and logical content. Our Go library is a professional solution to convert HTML to XML files online. This Cloud SDK gives Go developers powerful functionality and perfect XML output.
Code example in Go using REST API to convert HTML to XML format
// For complete examples and data files, please go to https://github.com/aspose-cells-cloud/aspose-cells-cloud-go/ package main import ( "os" asposecellscloud "github.com/aspose-cells-cloud/aspose-cells-cloud-go/v22" ) func main() instance := asposecellscloud.NewCellsApiService(os.Getenv("ProductClientId"), os.Getenv("ProductClientSecret")) file, err := os.Open("Book1.html") if err != nil return > convertWorkbookOpts := new(asposecellscloud.CellsWorkbookPutConvertWorkbookOpts) convertWorkbookOpts.Format = "xml" value, response, err1 := instance.CellsWorkbookPutConvertWorkbook(file, convertWorkbookOpts) if err1 != nil return > file1, err2 := os.Create("Dest.xml") if err2 != nil return > if _, err3 := file1.Write(value); err3 != nil return > file1.Close() >
How to use Go API to convert HTML to XML
- Create an account at Dashboard to get free API quota & authorization details
- Initialize CellsApi with Client Id, Client Secret, Base URL & API version
- Call CellsWorkbookPutConvertWorkbook method to get the resultant stream
xslt
This package is not in the latest version of its module.
Details
- Valid go.mod file The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
- Redistributable license Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
- Tagged version Modules with tagged versions give importers more predictable builds.
- Stable version When a project reaches major version v1 it is considered stable.
- Learn more
Repository
Links
README ¶
go-xslt
Description
go-xslt is a Go module that performs basic XSLT 1.0 transformations via Libxslt.
Installation
You’ll need the development libraries for libxml2 and libxslt, along with those for liblzma and zlib. Install these via your package manager. For instance, if using apt then:
sudo apt install libxml2-dev libxslt1-dev liblzma-dev zlib1g-dev
This module can be installed with the go get command:
go get -u github.com/wamuir/go-xslt
Usage
// style is an XSLT 1.0 stylesheet, as []byte. xs, err := xslt.NewStylesheet(style) if err != nil < panic(err) >defer xs.Close() // doc is an XML document to be transformed and res is the result of // the XSL transformation, both as []byte. res, err := xs.Transform(doc) if err != nil