Ruby html to xml

Содержание

Saved searches
Use saved searches to filter your results more quickly
License
yorickpeterse/oga
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
Ruby API to convert HTML to XML
Convert a HTML file to XML in Ruby
Code example in Ruby using REST API to convert HTML to XML format
How to use Ruby API to convert HTML to XML

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Oga is an XML/HTML parser written in Ruby.

License

yorickpeterse/oga

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

NOTE: my spare time is limited which means I am unable to dedicate a lot of time on Oga. If you’re interested in contributing to FOSS, please take a look at the open issues and submit a pull request to address them where possible.

Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for parsing, modifying and querying documents (using XPath expressions). Oga does not require system libraries such as libxml, making it easier and faster to install on various platforms. To achieve better performance Oga uses a small, native extension (C for MRI/Rubinius, Java for JRuby).

Oga provides an API that allows you to safely parse and query documents in a multi-threaded environment, without having to worry about your applications blowing up.

Oga: A large two-person saw used for ripping large boards in the days before power saws. One person stood on a raised platform, with the board below him, and the other person stood underneath them.

Oga uses the version format MAJOR.MINOR (e.g. 2.1 ). An increase of the MAJOR version indicates backwards incompatible changes were introduced. The MINOR version is only increased when changes are backwards compatible, regardless of whether those changes are bugfixes or new features. Up until version 1.0 the code should be considered unstable meaning it can change (and break) at any given moment.

APIs explicitly tagged as private (e.g. using Ruby’s private keyword or YARD’s @api private tag) are not covered by these rules.

Parsing a simple string of XML:

Parsing XML using strict mode (disables automatic tag insertion):

Oga.parse_xml('foo', :strict => true) # works fine Oga.parse_xml('foo', :strict => true) # throws an error

Parsing a simple string of HTML:

Parsing an IO handle pointing to XML (this also works when using Oga.parse_html ):

handle = File.open('path/to/file.xml') Oga.parse_xml(handle)

Parsing an IO handle using the pull parser:

handle = File.open('path/to/file.xml') parser = Oga::XML::PullParser.new(handle) parser.parse do |node| parser.on(:text) do puts node.text end end

Using an Enumerator to download and parse an XML document on the fly:

enum = Enumerator.new do |yielder| HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk| yielder

 Parse a string of XML using the SAX parser:
 class ElementNames attr_reader :names def initialize @names = [] end def on_element(namespace, name, attrs = <>) @names ') handler.names # => ["foo", "bar"] 
 Querying a document using XPath:
 document = Oga.parse_xml  Alice 28   EOF # The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can # iterate over. document.xpath('people/person').each do |person| puts person.get('id') # => "1" # The "at_xpath" method returns a single node from a set, it's the same as # person.xpath('name').first. puts person.at_xpath('name').text # => "Alice" end 
 Querying the same document using CSS:
 document = Oga.parse_xml  Alice 28   EOF # The "css" method returns an enumerable (Oga::XML::NodeSet) that you can # iterate over. document.css('people person').each do |person| puts person.get('id') # => "1" # The "at_css" method returns a single node from a set, it's the same as # person.css('name').first. puts person.at_css('name').text # => "Alice" end 
 Modifying a document and serializing it back to XML:
 document = Oga.parse_xml('Alice') name = document.at_xpath('people/person[1]/text()') name.text = 'Bob' document.to_xml # => "Bob" 
 Querying a document using a namespace:
 document = Oga.parse_xml(' ') div = document.xpath('root/x:div').first div.namespace # => Namespace(name: "x" uri: "foo") 
  Support for parsing XML and HTML(5)  DOM parsing
 Stream/pull parsing
 SAX parsing
 
   Ruby  Required  Recommended  
  
  MRI  >= 2.3.0  >= 2.6.0  
  JRuby  >= 1.7  >= 1.7.12  
  Rubinius  Not supported  
  Maglev  Not supported  
  Topaz  Not supported  
  mruby  Not supported  
 
 
 Maglev and Topaz are not supported due to the lack of a C API (that I know of) and the lack of active development of these Ruby implementations. mruby is not supported because it's a very different implementation all together.
 To install Oga on MRI or Rubinius you'll need to have a working compiler such as gcc or clang. Oga's C extension can be compiled with both. JRuby does not require a compiler as the native extension is compiled during the Gem building process and bundled inside the Gem itself.
 Oga does not use a unsynchronized global mutable state. As a result of this you can parse/create documents concurrently without any problems. Modifying documents concurrently can lead to bugs as these operations are not synchronized.
 Some querying operations will cache data in instance variables, without synchronization. An example is Oga::XML::Element#namespace which will cache an element's namespace after the first call.
 In general it's recommended to not use the same document in multiple threads at the same time.
 Oga fully supports parsing/registering XML namespaces as well as querying them using XPath. For example, take the following XML:
 If one were to try and query the bar element (e.g. using XPath root/bar ) they'd end up with an empty node set. This is due to defining an alternative default namespace. Instead you can query this element using the following XPath:
 *[local-name() = "root"]/*[local-name() = "bar"] 
 Alternatively, if you don't really care where the element is located you can use the following:
 And if you want to specify an explicit namespace URI, you can use this:
 descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"] 
 Like Nokogiri, Oga provides a way to create "dynamic" namespaces. That is, Oga allows one to query the above document as following:
 document = Oga.parse_xml('bar') document.xpath('x:root/x:bar', namespaces: 'http://example.com'>) 
 Moreover, because Oga assigns the name "xmlns" to default namespaces you can use this in your XPath queries:
 document = Oga.parse_xml('bar') document.xpath('xmlns:root/xmlns:bar') 
 When using this you can still restrict the query to the correct namespace URI:
 document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar') 
 Oga fully supports HTML5 including the omission of certain tags. For example, the following is parsed just fine:
 This is effectively parsed into:
 One exception Oga makes is that it does not automatically insert html , head and body tags. Automatically inserting these tags requires a distinction between documents and fragments as a user might not always want these tags to be inserted if left out. This complicates the user facing API as well as complicating the parsing internals of Oga. As a result I have decided that Oga does not insert these tags when left out.
 A more in depth explanation can be found here: #98
 The documentation is best viewed on the documentation website.
 Why Another HTML/XML parser? 
 Currently there are a few existing parser out there, the most famous one being Nokogiri. Another parser that's becoming more popular these days is Ox. Ruby's standard library also comes with REXML.
 The sad truth is that these existing libraries are problematic in their own ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works because of the non concurrent nature of MRI, on JRuby it works because it's implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a library, is not thread-safe and problematic to install on certain platforms (apparently). I don't want to compile libxml2 every time I install Nokogiri either.
 To give an example about the issues with Nokogiri on Rubinius (or any other Ruby implementation that is not MRI or JRuby), take a look at these issues:
 Some of these have been fixed, some have not. The core problem remains: Nokogiri acts in a way that there can be a large number of places where it might break due to throwing around void pointers and what not and expecting that things magically work. Note that I have nothing against the people running these projects, I just heavily, heavily dislike the resulting codebase one has to deal with today.
 Ox looks very promising but it lacks a rather crucial feature: parsing HTML (without using a SAX API). It's also again a C extension making debugging more of a pain (at least for me).
 I just want an XML/HTML parser that I can rely on stability wise and that is written in Ruby so I can actually debug it. In theory it should also make it easier for other Ruby developers to contribute.
 Источник
 Ruby API to convert HTML to XML
 Use Cells Conversion REST API to create customized spreadsheet workflows in Ruby. This is a professional solution to convert HTML to XML and other document formats online using Ruby.
 Convert a HTML file to XML in Ruby
 Converting file formats from HTML to XML is a complex task. All HTML to XML format transitions is performed by our Ruby SDK while maintaining the source HTML spreadsheet's main structural and logical content. Our Ruby library is a professional solution to convert HTML to XML files online. This Cloud SDK gives Ruby developers powerful functionality and perfect XML output.
 Code example in Ruby using REST API to convert HTML to XML format
 # For complete examples and data files, please go to https://github.com/aspose-cells-cloud/aspose-cells-cloud-ruby/  describe 'cells_save_as_post_document_save_as test' do  it "should work" do  @instance = AsposeCellsCloud::CellsApi.new($client_id,$client_secret,"v3.0","https://api.aspose.cloud/")  name = "BOOK1.html"  format = 'xml'  @instance.cells_workbook_put_convert_workbook( ::File.open(File.expand_path("data/"+name),"r") |io| io.read(io.size) >,:format=>format>)  end  end 
 How to use Ruby API to convert HTML to XML
  Create an account at Dashboard to get free API quota & authorization details
 Initialize CellsApi with Client Id, Client Secret, Base URL & API version
 Call cells_workbook_put_convert_workbook method to get the resultant stream
 
 Источник
 
Читайте также:  Создать странички на html

Ruby html to xml

Saved searches

Use saved searches to filter your results more quickly

License

yorickpeterse/oga

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

Ruby API to convert HTML to XML

Convert a HTML file to XML in Ruby

Code example in Ruby using REST API to convert HTML to XML format

How to use Ruby API to convert HTML to XML