Html to xml golang

Unmarshal HTML nested in XML

I am receiving an xml file from a 3rd party which has an HTML element within one of the XML tags. I cannot work out how to unmarshal this to get the href URL. XML Example:

type Href struct < Link string `xml:"href"` >type Link struct < URL []Href `xml:"a"` >type XmlFile struct < HTMLTag []Link `xml:"SOME_HTML"` >myFile := []byte(`   google `) var output XmlFile err := xml.Unmarshal(myFile, &output) fmt.Println(output) //

Welcome @asnow!. . which I’m guessing you’ve chosen to keep your example simple doesn’t really tell us the kind of HTML you’re expecting. HTML (as opposed to XHTML) has tag omission and other shortform syntax which, when inserted as a string into XML will make go’s XML parser (or any other for that matter) fail hard. But it could well be that you’re receiving HTML in XML serialization; it’s not 100% clear from your question. So please show us actual HTML you’re receiving.

The example is pretty much exactly what I am receiving, only difference is some other fields around it and the tag is actually

3 Answers 3

type aElement struct < Href string `xml:"href,attr"` >type content struct < A aElement `xml:"a"` >func main() < test := `google` var result content if err := xml.Unmarshal([]byte(test), &result); err != nil < log.Fatal(err) >fmt.Println(result) > 

Parsing everything in the xml, assuming also there could be more than one a tag in the html or other tags (like a div ).

Читайте также:  Оператор switch java примеры

If this is not needed, just replace XmlFile.Links with XmlFile.Link of type Link (not []Link )

func main() < type Link struct < XMLName xml.Name `xml:"a"` URL string `xml:"href,attr"` Target string `xml:"target,attr"` Content string `xml:",chardata"` >type Div struct < XMLName xml.Name `xml:"div"` Classes string `xml:"class,attr"` Content string `xml:",chardata"` >type XmlFile struct < XMLName xml.Name `xml:"SOME_HTML"` Links []Link `xml:"a"` Divs []Div `xml:"div"` >myFile := []byte(`  google facebook fmt.Println(output) > 

Edit: Added more tags in the xml to show how to parse different tag types.

You can parse the example you posted using a regular XML parser, there are however a whole lot of exceptions to the XML syntax which are commonly accepted as valid HTML.

The simplest example I can think of is : all html interpreters I know of understand that
(an unclosed
tag) is the same as a self closing
tag.

If you don’t know how the HTML on the other end of the service is generated, you are better off using an HTML parser.

For example, there is the golang.go/x/net/html package which provides several functions to parse HTML :

func findFirstHref(n *html.Node, indent string) string < if n.Type == html.ElementNode < fmt.Println(" * scanning:" + indent + n.Data) >if n.Type == html.ElementNode && n.Data == "a" < for _, a := range n.Attr < if a.Key == "href" < return a.Val >> > for c := n.FirstChild; c != nil; c = c.NextSibling < href := findFirstHref(c, indent+" ") if href != "" < return href >> return "" > func main() < doc1, err := html.Parse(strings.NewReader(sample1)) if err != nil < fmt.Println(err) >else < fmt.Println("href in sample1:", findFirstHref(doc1, "")) >doc2, err := html.Parse(strings.NewReader(sample2)) if err != nil < fmt.Println(err) >else < fmt.Println("href in sample2:", findFirstHref(doc2, "")) >> const ( sample1 = `   google ` // sample2 is an invalid XML document (it has unclosed "
" tags): sample2 = `

line1
line2 Some
text

` )

Источник

Parsing HTML files with Go

Is encoding/xml the best library to parse HTML table files like this one and exist some examples how to do it?

    
Test 1
Type Region
Type   Count Percent
T1   34,314 31.648%
T2   25,820 23.814%
T3   4,871 4.493%
Type   Count Percent
T4   34,314 31.648%
T5   11,187 10.318%
T6   25,820 23.814%

Have you tried godoc.org/code.google.com/p/go.net/html — first Google result for «golang html parser» 😉

1 Answer 1

Strictly speaking, the only one kind of HTML which is guaranteed to be parsed by a conforming XML parser is XHTML, but despite the fact XHTML once has been thought of as coming to be the HTML standard, it has not really taken off the ground and these days it’s considered obsolete (in favor of the much hyped «HTML5» thing and all the ecosystem around it). The basic problem with HTML is that while it looks like XML it has different rules. One glaring distinction is that
is a perfectly legal HTML but is an unterminated element in XML (in the latter, it has to be spelled
), and there are a lot more differences.

On the other hand, your particular example looks quite XML’ish to me, so if you can guarantee your data, while being HTML, will always be a well-formed XML at the same time, you can just use the encoding/xml package. Otherwise go for go.net/html , as suggested by @elithrar, or find some other package.

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.24.43543

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

Go API to convert HTML to XML

Use Cells Conversion REST API to create customized spreadsheet workflows in Go. This is a professional solution to convert HTML to XML and other document formats online using Go.

Convert a HTML file to XML in Go

Converting file formats from HTML to XML is a complex task. All HTML to XML format transitions is performed by our Go SDK while maintaining the source HTML spreadsheet’s main structural and logical content. Our Go library is a professional solution to convert HTML to XML files online. This Cloud SDK gives Go developers powerful functionality and perfect XML output.

Code example in Go using REST API to convert HTML to XML format

// For complete examples and data files, please go to https://github.com/aspose-cells-cloud/aspose-cells-cloud-go/  package main  import (  "os"  asposecellscloud "github.com/aspose-cells-cloud/aspose-cells-cloud-go/v22"  )  func main()   instance := asposecellscloud.NewCellsApiService(os.Getenv("ProductClientId"), os.Getenv("ProductClientSecret"))  file, err := os.Open("Book1.html")  if err != nil   return  >  convertWorkbookOpts := new(asposecellscloud.CellsWorkbookPutConvertWorkbookOpts)  convertWorkbookOpts.Format = "xml"  value, response, err1 := instance.CellsWorkbookPutConvertWorkbook(file, convertWorkbookOpts)  if err1 != nil   return  >  file1, err2 := os.Create("Dest.xml")  if err2 != nil   return  >  if _, err3 := file1.Write(value); err3 != nil   return  >  file1.Close()  > 

How to use Go API to convert HTML to XML

  1. Create an account at Dashboard to get free API quota & authorization details
  2. Initialize CellsApi with Client Id, Client Secret, Base URL & API version
  3. Call CellsWorkbookPutConvertWorkbook method to get the resultant stream

Источник

xslt

This package is not in the latest version of its module.

Details

  • Valid go.mod file The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
  • Redistributable license Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
  • Tagged version Modules with tagged versions give importers more predictable builds.
  • Stable version When a project reaches major version v1 it is considered stable.
  • Learn more

Repository

README ¶

go-xslt

Description

go-xslt is a Go module that performs basic XSLT 1.0 transformations via Libxslt.

Installation

You’ll need the development libraries for libxml2 and libxslt, along with those for liblzma and zlib. Install these via your package manager. For instance, if using apt then:

sudo apt install libxml2-dev libxslt1-dev liblzma-dev zlib1g-dev 

This module can be installed with the go get command:

go get -u github.com/wamuir/go-xslt 

Usage

 // style is an XSLT 1.0 stylesheet, as []byte. xs, err := xslt.NewStylesheet(style) if err != nil < panic(err) >defer xs.Close() // doc is an XML document to be transformed and res is the result of // the XSL transformation, both as []byte. res, err := xs.Transform(doc) if err != nil

Источник

Оцените статью