Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Related tags

ruby ruby-gem xml nokogiri sax
Overview

Nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java).

Guiding Principles

Some guiding principles Nokogiri tries to follow:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Features Overview

  • DOM Parser for XML and HTML4
  • SAX Parser for XML and HTML4
  • Push Parser for XML and HTML4
  • Document search via XPath 1.0
  • Document search via CSS3 selectors, with some jquery-like extensions
  • XSD Schema validation
  • XSLT transformation
  • "Builder" DSL for XML and HTML documents

Status

Concourse CI Appveyor CI Code Climate Test Coverage

Gem Version SemVer compatibility Tidelift dependencies

Support, Getting Help, and Reporting Issues

All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).

Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.

Reading

Your first stops for learning more about Nokogiri should be:

Ask For Help

There are a few ways to ask exploratory questions:

Please do not mail the maintainers at their personal addresses.

Report A Bug

The Nokogiri bug tracker is at https://github.com/sparklemotion/nokogiri/issues

Please use the "Bug Report" or "Installation Difficulties" templates.

Security and Vulnerability Reporting

Please report vulnerabilities at https://hackerone.com/nokogiri

Full information and description of our security policy is in SECURITY.md

Semantic Versioning Policy

Nokogiri follows Semantic Versioning (since 2017 or so). Dependabot's SemVer compatibility score for Nokogiri

We bump Major.Minor.Patch versions following this guidance:

Major: (we've never done this)

  • Significant backwards-incompatible changes to the public API that would require rewriting existing application code.
  • Some examples of backwards-incompatible changes we might someday consider for a Major release are at ROADMAP.md.

Minor:

Patch:

  • Bugfixes.
  • Security updates.
  • Updating packaged libraries for security-related reasons.

Installation

Requirements:

  • Ruby >= 2.5
  • JRuby >= 9.2.0.0

Native Gems: Faster, more reliable installation

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries, or for system dependencies to exist. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

Supported Platforms

As of v1.11.0, Nokogiri ships pre-compiled, "native" gems for the following platforms:

  • Linux: x86-linux and x86_64-linux (req: glibc >= 2.17), including musl platforms like Alpine
  • Darwin/MacOS: x86_64-darwin and arm64-darwin
  • Windows: x86-mingw32 and x64-mingw32
  • Java: any platform running JRuby 9.2 or higher

To determine whether your system supports one of these gems, look at the output of bundle platform or ruby -e 'puts Gem::Platform.local.to_s'.

If you're on a supported platform, either gem install or bundle install should install a native gem without any additional action on your part. This installation should only take a few seconds, and your output should look something like:

$ gem install nokogiri
Fetching nokogiri-1.11.0-x86_64-linux.gem
Successfully installed nokogiri-1.11.0-x86_64-linux
1 gem installed

Other Installation Options

Because Nokogiri is a C extension, it requires that you have a C compiler toolchain, Ruby development header files, and some system dependencies installed.

The following may work for you if you have an appropriately-configured system:

gem install nokogiri

If you have any issues, please visit Installing Nokogiri for more complete instructions and troubleshooting.

How To Use Nokogiri

Nokogiri is a large library, and so it's challenging to briefly summarize it. We've tried to provide long, real-world examples at Tutorials.

Parsing and Querying

Here is example usage for parsing and querying a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

Encoding

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')

Technical Overview

Guiding Principles

As noted above, two guiding principles of the software are:

  • be secure-by-default by treating all documents as untrusted by default
  • be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers

Notably, despite all parsers being standards-compliant, there are behavioral inconsistencies between the parsers used in the CRuby and JRuby implementations, and Nokogiri does not and should not attempt to remove these inconsistencies. Instead, we surface these differences in the test suite when they are important/semantic; or we intentionally write tests to depend only on the important/semantic bits (omitting whitespace from regex matchers on results, for example).

CRuby

The Ruby (a.k.a., CRuby, MRI, YARV) implementation is a C extension that depends on libxml2 and libxslt (which in turn depend on zlib and possibly libiconv).

These dependencies are met by default by Nokogiri's packaged versions of the libxml2 and libxslt source code, but a configuration option --use-system-libraries is provided to allow specification of alternative library locations. See Installing Nokogiri for full documentation.

We provide native gems by pre-compiling libxml2 and libxslt (and potentially zlib and libiconv) and packaging them into the gem file. In this case, no compilation is necessary at installation time, which leads to faster and more reliable installation.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are provided in which native and source gems.

JRuby

The Java (a.k.a. JRuby) implementation is a Java extension that depends primarily on Xerces and NekoHTML for parsing, though additional dependencies are on isorelax, nekodtd, jing, serializer, xalan-j, and xml-apis.

These dependencies are provided by pre-compiled jar files packaged in the java platform gem.

See LICENSE-DEPENDENCIES.md for more information on which dependencies are provided in which native and source gems.

Contributing

See CONTRIBUTING.md for an intro guide to developing Nokogiri.

Code of Conduct

We've adopted the Contributor Covenant code of conduct, which you can read in full in CODE_OF_CONDUCT.md.

License

This project is licensed under the terms of the MIT license.

See this license at LICENSE.md.

Dependencies

Some additional libraries may be distributed with your version of Nokogiri. Please see LICENSE-DEPENDENCIES.md for a discussion of the variations as well as the licenses thereof.

Authors

  • Mike Dalessio
  • Aaron Patterson
  • Yoko Harada
  • Akinori MUSHA
  • John Shahid
  • Karol Bucek
  • Lars Kanis
  • Sergio Arbeo
  • Timothy Elliott
  • Nobuyoshi Nakada
Issues
  • Support Ruby 2.2 on Windows

    Support Ruby 2.2 on Windows

    Currently (1.6.6.2) we only cross-compile for 1.9.3, 2.0.0 and 2.1.3.

    platform/windows 
    opened by flavorjones 210
  • libiconv is missing

    libiconv is missing

    We're running Ruby 1.8.7 (through RVM) on OS X 10.6.6. Trying to install nokogiri 1.4.4 using bundler results always the following error:

    Installing nokogiri (1.4.4) with native extensions /Users/administrator/.rvm/rubies/ruby-1.8.7-p302/lib/ruby/site_ruby/1.8/rubygems/installer.rb:533:in `build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError)
    
    /Users/administrator/.rvm/rubies/ruby-1.8.7-p302/bin/ruby extconf.rb 
    checking for libxml/parser.h... yes
    checking for libxslt/xslt.h... yes
    checking for libexslt/exslt.h... yes
    checking for iconv_open() in iconv.h... no
    checking for iconv_open() in -liconv... no
    -----
    libiconv is missing.  please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies.
    -----
    *** extconf.rb failed ***
    […]
    

    Following the instructions here in the wiki as well as around the net we've used MacPorts to install libxml2 and libxslt, which results in the following situation:

    sudo port install libxml2 libxslt
    Password:
    Error: Cannot install libxml2 for the arch(s) 'x86_64' because
    Error: its dependency libiconv is only installed for the archs 'i386 ppc'.
    Error: Unable to execute port: architecture mismatch
    

    We've build libxml2 and libxslt from source and tried both to install the gem with the flags for the MacPorts and the build as described in the wiki as well as a few variations found online. — The results remains the same. We've even downloaded the gem, changed extconf.rb as suggested here: https://github.com/tenderlove/nokogiri/issues#issue/381, and compiled it locally but the final outcome is the same.

    What are we missing? We're pretty much stuck with this situation.

    opened by polarblau 109
  • Support Ruby x64 on Windows

    Support Ruby x64 on Windows

    With Ruby 2.0 release there's now Ruby x64 available for Windows. Also DevKit have been released with MinGW x64. Most of typical gems does build fine with it. http://rubyinstaller.org/downloads/

    opened by davispuh 108
  • Nokogiri error: Libiconv missing on mavericks

    Nokogiri error: Libiconv missing on mavericks

    I am trying to install nokogiri on maverick and i am unable to do so, even when libiconv is installed through brew:

    • Here is the log
    Installing nokogiri (1.6.2.1) Building nokogiri using packaged libraries.
    
    Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.
    
        /usr/local/rvm/rubies/ruby-2.0.0-p0/bin/ruby extconf.rb 
    Building nokogiri using packaged libraries.
    checking for iconv.h... yes
    checking for iconv_open() in iconv.h... no
    checking for iconv_open() in -liconv... no
    checking for libiconv_open() in iconv.h... no
    checking for libiconv_open() in -liconv... no
    -----
    libiconv is missing.  please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies.
    -----
    *** extconf.rb failed ***
    Could not create Makefile due to some reason, probably lack of necessary
    libraries and/or headers.  Check the mkmf.log file for more details.  You may
    need configuration options.
    
    Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=/usr/local/rvm/rubies/ruby-2.0.0-p0/bin/ruby
        --help
        --clean
        --use-system-libraries
        --enable-static
        --disable-static
        --with-zlib-dir
        --without-zlib-dir
        --with-zlib-include
        --without-zlib-include=${zlib-dir}/include
        --with-zlib-lib
        --without-zlib-lib=${zlib-dir}/lib
        --enable-cross-build
        --disable-cross-build
    
    
    Gem files will remain installed in /Users/Jaffery/Desktop/helpspree/vendor/bundle/ruby/2.0.0/gems/nokogiri-1.6.2.1 for inspection.
    Results logged to /Users/Jaffery/Desktop/helpspree/vendor/bundle/ruby/2.0.0/gems/nokogiri-1.6.2.1/ext/nokogiri/gem_make.out
    
    An error occurred while installing nokogiri (1.6.2.1), and Bundler cannot continue.
    Make sure that `gem install nokogiri -v '1.6.2.1'` succeeds before bundling.
    
    opened by Jaffery5 60
  • Compile issue on OS X Mavericks 10.9.5

    Compile issue on OS X Mavericks 10.9.5

    Despite saying that it's using a special bundled libxml2, the process fails to find it. Here is the interesting excerpt from ~/.rvm/gems/[email protected]_service/extensions/x86_64-darwin-13/2.1.0-static/nokogiri-1.6.5/mkmf.log. Suspicious lines (there is no "travis" user on my system):

    ld: warning: directory not found for option '-L/Users/travis/.sm/pkg/active/lib'
    ld: library not found for -llibxml2
    
    conftest.c:15:27: error: too few arguments to function call, single argument 'cur' was not specified
    int t(void) { xmlParseDoc(); return 0; }
                  ~~~~~~~~~~~ ^
    /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2/libxml/parser.h:841:11: note: 'xmlParseDoc' declared here
    XMLPUBFUN xmlDocPtr XMLCALL
              ^
    
    have_library: checking for xmlParseDoc() in -llibxml2... -------------------- no
    
    "gcc -o conftest -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0/x86_64-darwin13.0 -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0/ruby/backward -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0 -I. -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/include -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2 -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2 -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -D_DARWIN_UNLIMITED_SELECT -D_REENTRANT   -DNOKOGIRI_LIBXML2_PATH\=\"/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2\" -DNOKOGIRI_LIBXML2_PATCHES\=\"0001-Revert-Missing-initialization-for-the-catalog-module.patch\ 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch\" -DNOKOGIRI_LIBXSLT_PATH\=\"/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28\" -DNOKOGIRI_LIBXSLT_PATCHES\=\"0001-Adding-doc-update-related-to-1.1.28.patch\ 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch\ 0003-Initialize-pseudo-random-number-generator-with-curre.patch\ 0004-EXSLT-function-str-replace-is-broken-as-is.patch\ 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch\ 0007-Separate-function-for-predicate-matching-in-patterns.patch\ 0008-Fix-direct-pattern-matching.patch\ 0009-Fix-certain-patterns-with-predicates.patch\ 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch\ 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch\ 0014-Fix-for-bug-436589.patch\ 0015-Fix-mkdir-for-mingw.patch\" -O3 -I/Users/travis/.sm/pkg/active/include -fPIC -mmacosx-version-min=10.6 -pipe  -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -DNOKOGIRI_USE_PACKAGED_LIBRARIES conftest.c  -L. -L/Users/bruce/.rvm/rubies/ruby-2.1.2/lib -L/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib -L/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib -L. -L/Users/travis/.sm/pkg/active/lib -fPIC -Bstatic -fstack-protector   -arch x86_64  /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libexslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libxslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a -llzma -lruby-static -framework CoreFoundation -llibxml2 /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libexslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libxslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a -llzma -lpthread -ldl -lobjc  "
    ld: warning: directory not found for option '-L/Users/travis/.sm/pkg/active/lib'
    ld: library not found for -llibxml2
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    checked program was:
    /* begin */
     1: #include "ruby.h"
     2: 
     3: #include <libxml/parser.h>
     4: 
     5: /*top*/
     6: extern int t(void);
     7: int main(int argc, char **argv)
     8: {
     9:   if (argc > 1000000) {
    10:     printf("%p", &t);
    11:   }
    12: 
    13:   return 0;
    14: }
    15: int t(void) { void ((*volatile p)()); p = (void ((*)()))xmlParseDoc; return 0; }
    /* end */
    
    "gcc -o conftest -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0/x86_64-darwin13.0 -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0/ruby/backward -I/Users/bruce/.rvm/rubies/ruby-2.1.2/include/ruby-2.1.0 -I. -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/include -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2 -I/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2 -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -D_DARWIN_UNLIMITED_SELECT -D_REENTRANT   -DNOKOGIRI_LIBXML2_PATH\=\"/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2\" -DNOKOGIRI_LIBXML2_PATCHES\=\"0001-Revert-Missing-initialization-for-the-catalog-module.patch\ 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch\" -DNOKOGIRI_LIBXSLT_PATH\=\"/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28\" -DNOKOGIRI_LIBXSLT_PATCHES\=\"0001-Adding-doc-update-related-to-1.1.28.patch\ 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch\ 0003-Initialize-pseudo-random-number-generator-with-curre.patch\ 0004-EXSLT-function-str-replace-is-broken-as-is.patch\ 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch\ 0007-Separate-function-for-predicate-matching-in-patterns.patch\ 0008-Fix-direct-pattern-matching.patch\ 0009-Fix-certain-patterns-with-predicates.patch\ 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch\ 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch\ 0014-Fix-for-bug-436589.patch\ 0015-Fix-mkdir-for-mingw.patch\" -O3 -I/Users/travis/.sm/pkg/active/include -fPIC -mmacosx-version-min=10.6 -pipe  -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline -DNOKOGIRI_USE_PACKAGED_LIBRARIES conftest.c  -L. -L/Users/bruce/.rvm/rubies/ruby-2.1.2/lib -L/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib -L/Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib -L. -L/Users/travis/.sm/pkg/active/lib -fPIC -Bstatic -fstack-protector   -arch x86_64  /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libexslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libxslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a -llzma -lruby-static -framework CoreFoundation -llibxml2 /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libexslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxslt/1.1.28/lib/libxslt.a -lm -liconv -lpthread -lz /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/lib/libxml2.a -llzma -lpthread -ldl -lobjc  "
    conftest.c:15:27: error: too few arguments to function call, single argument 'cur' was not specified
    int t(void) { xmlParseDoc(); return 0; }
                  ~~~~~~~~~~~ ^
    /Users/bruce/.rvm/gems/[email protected]_service/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin13.1.0/libxml2/2.9.2/include/libxml2/libxml/parser.h:841:11: note: 'xmlParseDoc' declared here
    XMLPUBFUN xmlDocPtr XMLCALL
              ^
    1 error generated.
    checked program was:
    /* begin */
     1: #include "ruby.h"
     2: 
     3: #include <libxml/parser.h>
     4: 
     5: /*top*/
     6: extern int t(void);
     7: int main(int argc, char **argv)
     8: {
     9:   if (argc > 1000000) {
    10:     printf("%p", &t);
    11:   }
    12: 
    13:   return 0;
    14: }
    15: int t(void) { xmlParseDoc(); return 0; }
    /* end */
    
    --------------------
    
    platform/osx 
    opened by baburdick 58
  • Nokogiri 1.6.8 Install Fails on Mac OS X with xz installed from Homebrew

    Nokogiri 1.6.8 Install Fails on Mac OS X with xz installed from Homebrew

    Action:

    brew install xz
    gem install nokogiri -v 1.6.8
    

    Expected:

    Nokogiri 1.6.8 installs correctly.

    Actual:

    Restoring gems to pristine condition...
    Building native extensions.  This could take a while...
    ERROR:  While executing gem ... (Gem::Ext::BuildError)
        ERROR: Failed to build gem native extension.
    
        current directory: /Users/austin/.gem/ruby/2.3.0/gems/nokogiri-1.6.8/ext/nokogiri
    /Users/austin/.rubies/ruby-2.3.0/bin/ruby -r ./siteconf20160607-71935-1m9be2b.rb extconf.rb
    Using pkg-config version 1.1.7
    checking if the C compiler accepts ... yes
    checking if the C compiler accepts -Wno-error=unused-command-line-argument-hard-error-in-future... no
    Building nokogiri using packaged libraries.
    Using mini_portile version 2.1.0
    checking for iconv.h... yes
    checking for gzdopen() in -lz... yes
    checking for iconv... yes
    ************************************************************************
    IMPORTANT NOTICE:
    
    Building Nokogiri with a packaged version of libxml2-2.9.4.
    
    Team Nokogiri will keep on doing their best to provide security
    updates in a timely manner, but if this is a concern for you and want
    to use the system library instead; abort this installation process and
    reinstall nokogiri as follows:
    
        gem install nokogiri -- --use-system-libraries
            [--with-xml2-config=/path/to/xml2-config]
            [--with-xslt-config=/path/to/xslt-config]
    
    If you are using Bundler, tell it to use the option:
    
        bundle config build.nokogiri --use-system-libraries
        bundle install
    
    Note, however, that nokogiri is not fully compatible with arbitrary
    versions of libxml2 provided by OS/package vendors.
    ************************************************************************
    Extracting libxml2-2.9.4.tar.gz into tmp/x86_64-apple-darwin15.3.0/ports/libxml2/2.9.4... OK
    Running 'configure' for libxml2 2.9.4... OK
    Running 'compile' for libxml2 2.9.4... ERROR, review '/Users/austin/.gem/ruby/2.3.0/gems/nokogiri-1.6.8/ext/nokogiri/tmp/x86_64-apple-darwin15.3.0/ports/libxml2/2.9.4/compile.log' to see what happened. Last lines are:
    ========================================================================
        unsigned short* in = (unsigned short*) inb;
                             ^~~~~~~~~~~~~~~~~~~~~
    encoding.c:815:27: warning: cast from 'unsigned char *' to 'unsigned short *' increases required alignment from 1 to 2 [-Wcast-align]
        unsigned short* out = (unsigned short*) outb;
                              ^~~~~~~~~~~~~~~~~~~~~~
    4 warnings generated.
      CC       error.lo
      CC       parserInternals.lo
      CC       parser.lo
      CC       tree.lo
      CC       hash.lo
      CC       list.lo
      CC       xmlIO.lo
    xmlIO.c:1450:52: error: use of undeclared identifier 'LZMA_OK'
        ret =  (__libxml2_xzclose((xzFile) context) == LZMA_OK ) ? 0 : -1;
                                                       ^
    1 error generated.
    make[2]: *** [xmlIO.lo] Error 1
    make[1]: *** [all-recursive] Error 1
    make: *** [all] Error 2
    ========================================================================
    *** extconf.rb failed ***
    Could not create Makefile due to some reason, probably lack of necessary
    libraries and/or headers.  Check the mkmf.log file for more details.  You may
    need configuration options.
    
    Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=/Users/austin/.rubies/ruby-2.3.0/bin/$(RUBY_BASE_NAME)
        --help
        --clean
        --use-system-libraries
        --enable-static
        --disable-static
        --with-zlib-dir
        --without-zlib-dir
        --with-zlib-include
        --without-zlib-include=${zlib-dir}/include
        --with-zlib-lib
        --without-zlib-lib=${zlib-dir}/lib
        --enable-cross-build
        --disable-cross-build
    /Users/austin/.gem/ruby/2.3.0/gems/mini_portile2-2.1.0/lib/mini_portile2/mini_portile.rb:366:in `block in execute': Failed to complete compile task (RuntimeError)
        from /Users/austin/.gem/ruby/2.3.0/gems/mini_portile2-2.1.0/lib/mini_portile2/mini_portile.rb:337:in `chdir'
        from /Users/austin/.gem/ruby/2.3.0/gems/mini_portile2-2.1.0/lib/mini_portile2/mini_portile.rb:337:in `execute'
        from /Users/austin/.gem/ruby/2.3.0/gems/mini_portile2-2.1.0/lib/mini_portile2/mini_portile.rb:111:in `compile'
        from /Users/austin/.gem/ruby/2.3.0/gems/mini_portile2-2.1.0/lib/mini_portile2/mini_portile.rb:150:in `cook'
        from extconf.rb:364:in `block (2 levels) in process_recipe'
        from extconf.rb:257:in `block in chdir_for_build'
        from extconf.rb:256:in `chdir'
        from extconf.rb:256:in `chdir_for_build'
        from extconf.rb:363:in `block in process_recipe'
        from extconf.rb:262:in `tap'
        from extconf.rb:262:in `process_recipe'
        from extconf.rb:555:in `<main>'
    
    To see why this extension failed to compile, please check the mkmf.log which can be found here:
    
      /Users/austin/.gem/ruby/2.3.0/extensions/x86_64-darwin-15/2.3.0-static/nokogiri-1.6.8/mkmf.log
    
    extconf failed, exit code 1
    
    Gem files will remain installed in /Users/austin/.gem/ruby/2.3.0/gems/nokogiri-1.6.8 for inspection.
    Results logged to /Users/austin/.gem/ruby/2.3.0/extensions/x86_64-darwin-15/2.3.0-static/nokogiri-1.6.8/gem_make.out
    gem pristine nokogiri -v 1.6.8  9.23s user 7.56s system 92% cpu 18.089 total
    

    When I do:

    brew uninstall xz
    gem install nokogiri -v 1.6.8
    

    Nokogiri 1.6.8 installs correctly.

    opened by halostatue 51
  • Nokogiri 1.6.4 won't compile

    Nokogiri 1.6.4 won't compile

    Hi there,

    We had to rollback to nokogiri 1.6.3.1 with libxml2 (2.9.0 instead of 2.9.2).

    Log incoming.

    Ubuntu 12.04 LTS

    opened by scalp42 50
  • LoadError when deploying to JRuby/Torquebox on Windows

    LoadError when deploying to JRuby/Torquebox on Windows

    Ruby: JRuby 1.7.11 Rails: 4.0.5 Nokogiri: 1.6.2.1 OS: Windows 2008 Server (64bit) Torquebox: 1.3.1 Java: 1.7.0_40-b43

    Starting the rails-app gives me:

    (LoadError) load error: nokogiri/nokogiri -- java.lang.NoClassDefFoundError: com/sun/org/apache/xpath/internal/VariableStack
    

    Older versions of Nokogiri (tried it with 1.5.11) work.

    platform/jruby 
    opened by fpauser 49
  • Seg Faults and Bus Errors with 1.2.3 but not with 1.2.2

    Seg Faults and Bus Errors with 1.2.3 but not with 1.2.2

    I haven't worked out a way to create a simple example of this, but when I tried to use webrat (which requires nokogiri) I found that my existing tests (about 10 test classes run from rake test:integration) when run together (not singly) caused a seg fault or a bus error (this was before I wrote any specific webrat tests).

    So as far as I understand Nokogiri shouldn't even be being called. And when I could deduce where the error occurred it was somewhere deep in the rails framework (but suspiciously in an html parsing context). I tried step-debugging the code, but this gave me an error in a different place.

    Running on a OSX 10.5.6, Ruby 1.8.6

    My workaround was to downgrade nokogiri to 1.2.2

    Sorry this wasn't such a useful issue, but maybe you can give me some guidance as to what information could help narrow down the possible cause.

    topic/libxml-ruby 
    opened by timdiggins 45
  • OSX install problem (1.6.5, 1.6.6.2)

    OSX install problem (1.6.5, 1.6.6.2)

    ➔ uname -a
    Darwin johns-MBP 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64
    ➔ rbenv version
    2.1.5 (set by /Users/john/.rbenv/version)
    ➔ gem install nokogiri -v '1.6.6.1'
    Fetching: nokogiri-1.6.6.1.gem (100%)
    Building native extensions.  This could take a while...
    ERROR:  Error installing nokogiri:
        ERROR: Failed to build gem native extension.
    
        /Users/john/.rbenv/versions/2.1.5/bin/ruby extconf.rb
    checking if the C compiler accepts ... yes
    checking if the C compiler accepts -Wno-error=unused-command-line-argument-hard-error-in-future... yes
    Building nokogiri using packaged libraries.
    checking for gzdopen() in -lz... yes
    checking for iconv... yes
    ************************************************************************
    IMPORTANT NOTICE:
    
    Building Nokogiri with a packaged version of libxml2-2.9.2
    with the following patches applied:
        - 0001-Revert-Missing-initialization-for-the-catalog-module.patch
        - 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
    
    Team Nokogiri will keep on doing their best to provide security
    updates in a timely manner, but if this is a concern for you and want
    to use the system library instead; abort this installation process and
    reinstall nokogiri as follows:
    
        gem install nokogiri -- --use-system-libraries
            [--with-xml2-config=/path/to/xml2-config]
            [--with-xslt-config=/path/to/xslt-config]
    
    If you are using Bundler, tell it to use the option:
    
        bundle config build.nokogiri --use-system-libraries
        bundle install
    
    Note, however, that nokogiri is not fully compatible with arbitrary
    versions of libxml2 provided by OS/package vendors.
    ************************************************************************
    Extracting libxml2-2.9.2.tar.gz into tmp/x86_64-apple-darwin14.0.0/ports/libxml2/2.9.2... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxml2/0001-Revert-Missing-initialization-for-the-catalog-module.patch...
    Running 'patch' for libxml2 2.9.2... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxml2/0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch...
    Running 'patch' for libxml2 2.9.2... OK
    Running 'configure' for libxml2 2.9.2... OK
    Running 'compile' for libxml2 2.9.2... OK
    Running 'install' for libxml2 2.9.2... OK
    Activating libxml2 2.9.2 (from /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/x86_64-apple-darwin14.0.0/libxml2/2.9.2)...
    ************************************************************************
    IMPORTANT NOTICE:
    
    Building Nokogiri with a packaged version of libxslt-1.1.28
    with the following patches applied:
        - 0001-Adding-doc-update-related-to-1.1.28.patch
        - 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
        - 0003-Initialize-pseudo-random-number-generator-with-curre.patch
        - 0004-EXSLT-function-str-replace-is-broken-as-is.patch
        - 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
        - 0007-Separate-function-for-predicate-matching-in-patterns.patch
        - 0008-Fix-direct-pattern-matching.patch
        - 0009-Fix-certain-patterns-with-predicates.patch
        - 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
        - 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
        - 0014-Fix-for-bug-436589.patch
        - 0015-Fix-mkdir-for-mingw.patch
    
    Team Nokogiri will keep on doing their best to provide security
    updates in a timely manner, but if this is a concern for you and want
    to use the system library instead; abort this installation process and
    reinstall nokogiri as follows:
    
        gem install nokogiri -- --use-system-libraries
            [--with-xml2-config=/path/to/xml2-config]
            [--with-xslt-config=/path/to/xslt-config]
    
    If you are using Bundler, tell it to use the option:
    
        bundle config build.nokogiri --use-system-libraries
        bundle install
    ************************************************************************
    Extracting libxslt-1.1.28.tar.gz into tmp/x86_64-apple-darwin14.0.0/ports/libxslt/1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0001-Adding-doc-update-related-to-1.1.28.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0003-Initialize-pseudo-random-number-generator-with-curre.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0004-EXSLT-function-str-replace-is-broken-as-is.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0006-Fix-str-padding-to-work-with-UTF-8-strings.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0007-Separate-function-for-predicate-matching-in-patterns.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0008-Fix-direct-pattern-matching.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0009-Fix-certain-patterns-with-predicates.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0014-Fix-for-bug-436589.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running patch with /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/patches/libxslt/0015-Fix-mkdir-for-mingw.patch...
    Running 'patch' for libxslt 1.1.28... OK
    Running 'configure' for libxslt 1.1.28... OK
    Running 'compile' for libxslt 1.1.28... OK
    Running 'install' for libxslt 1.1.28... OK
    Activating libxslt 1.1.28 (from /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1/ports/x86_64-apple-darwin14.0.0/libxslt/1.1.28)...
    checking for main() in -llzma... yes
    checking for xmlParseDoc() in libxml/parser.h... no
    checking for xmlParseDoc() in -lxml2... no
    checking for xmlParseDoc() in -llibxml2... no
    -----
    libxml2 is missing.  Please locate mkmf.log to investigate how it is failing.
    -----
    *** extconf.rb failed ***
    Could not create Makefile due to some reason, probably lack of necessary
    libraries and/or headers.  Check the mkmf.log file for more details.  You may
    need configuration options.
    
    Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=/Users/john/.rbenv/versions/2.1.5/bin/ruby
        --help
        --clean
        --use-system-libraries
        --enable-static
        --disable-static
        --with-zlib-dir
        --without-zlib-dir
        --with-zlib-include
        --without-zlib-include=${zlib-dir}/include
        --with-zlib-lib
        --without-zlib-lib=${zlib-dir}/lib
        --enable-cross-build
        --disable-cross-build
        --with-xml2lib
        --without-xml2lib
        --with-libxml2lib
        --without-libxml2lib
    
    extconf failed, exit code 1
    
    Gem files will remain installed in /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.6.1 for inspection.
    Results logged to /Users/john/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/extensions/x86_64-darwin-14/2.1.0-static/nokogiri-1.6.6.1/gem_make.out
    
    platform/osx 
    opened by jjb 44
  • [bug] valgrind errors spotted in gumbo parser

    [bug] valgrind errors spotted in gumbo parser

    Please describe the bug

    A CI test run caught some memory errors in gumbo.

    The build log is at https://github.com/sparklemotion/nokogiri/pull/2272/checks?check_run_id=2862967128

    Here's the relevant snippet:

    # Running tests with run options --seed 25442:
    
    ==489== Invalid read of size 8
    ==489==    at 0x8A170F4: parse_args_mark (gumbo.c:307)
    ==489==    by 0x490A58B: gc_mark_stacked_objects (gc.c:4737)
    ==489==    by 0x490A58B: gc_mark_stacked_objects_all (gc.c:4777)
    ==489==    by 0x490A58B: gc_marks_rest (gc.c:5653)
    ==489==    by 0x490AE9F: gc_marks (gc.c:5713)
    ==489==    by 0x490AE9F: gc_start (gc.c:6493)
    ==489==    by 0x490AE9F: gc_start (gc.c:6417)
    ==489==    by 0x490B5E2: garbage_collect (gc.c:6413)
    ==489==    by 0x490B5E2: gc_start_internal (gc.c:6722)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4CEED: vm_exec_core (insns.def:915)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A58F70: invoke_block (vm.c:979)
    ==489==    by 0x4A58F70: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A58F70: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A58F70: vm_yield (vm.c:1094)
    ==489==    by 0x4A58F70: rb_yield_0 (vm_eval.c:970)
    ==489==    by 0x4A58F70: rb_yield_1 (vm_eval.c:976)
    ==489==    by 0x4A58F70: rb_yield (vm_eval.c:986)
    ==489==    by 0x487477B: rb_ary_each (array.c:1837)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A58F70: invoke_block (vm.c:979)
    ==489==    by 0x4A58F70: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A58F70: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A58F70: vm_yield (vm.c:1094)
    ==489==    by 0x4A58F70: rb_yield_0 (vm_eval.c:970)
    ==489==    by 0x4A58F70: rb_yield_1 (vm_eval.c:976)
    ==489==    by 0x4A58F70: rb_yield (vm_eval.c:986)
    ==489==    by 0x487477B: rb_ary_each (array.c:1837)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A584C4: invoke_block (vm.c:979)
    ==489==    by 0x4A584C4: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A584C4: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A584C4: vm_yield_force_blockarg (vm.c:1110)
    ==489==    by 0x4A584C4: rb_yield_force_blockarg (vm_eval.c:1035)
    ==489==    by 0x4878FAB: rb_ary_collect (array.c:2758)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A53E8B: vm_call_method_each_type.part.135 (vm_insnhelper.c:2232)
    ==489==    by 0x4A5450A: vm_call_method_each_type (vm_insnhelper.c:2380)
    ==489==    by 0x4A5450A: vm_call_method (vm_insnhelper.c:2384)
    ==489==    by 0x4A5450A: vm_call_method (vm_insnhelper.c:2351)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A52BAA: invoke_block (vm.c:979)
    ==489==    by 0x4A52BAA: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A53889: invoke_block_from_c_proc (vm.c:1124)
    ==489==    by 0x4A53889: vm_invoke_proc (vm.c:1149)
    ==489==    by 0x4984B18: rb_proc_call (proc.c:887)
    ==489==    by 0x48F401C: exec_end_procs_chain (eval_jump.c:108)
    ==489==    by 0x48F401C: rb_exec_end_proc (eval_jump.c:125)
    ==489==    by 0x48F4171: ruby_finalize_0 (eval.c:124)
    ==489==    by 0x48F4497: ruby_cleanup (eval.c:182)
    ==489==    by 0x48F47C4: ruby_run_node (eval.c:303)
    ==489==    by 0x1090EA: main (main.c:42)
    ==489==  Address 0x1ffeffcce8 is on thread 1's stack
    ==489==  312 bytes below stack pointer
    ==489== 
    {
       <insert_a_suppression_name_here>
       Memcheck:Addr8
       fun:parse_args_mark
       fun:gc_mark_stacked_objects
       fun:gc_mark_stacked_objects_all
       fun:gc_marks_rest
       fun:gc_marks
       fun:gc_start
       fun:gc_start
       fun:garbage_collect
       fun:gc_start_internal
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield
       fun:rb_yield_0
       fun:rb_yield_1
       fun:rb_yield
       fun:rb_ary_each
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield
       fun:rb_yield_0
       fun:rb_yield_1
       fun:rb_yield
       fun:rb_ary_each
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield_force_blockarg
       fun:rb_yield_force_blockarg
       fun:rb_ary_collect
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_call_method_each_type.part.135
       fun:vm_call_method_each_type
       fun:vm_call_method
       fun:vm_call_method
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_proc
       fun:vm_invoke_proc
       fun:rb_proc_call
       fun:exec_end_procs_chain
       fun:rb_exec_end_proc
       fun:ruby_finalize_0
       fun:ruby_cleanup
       fun:ruby_run_node
       fun:main
    }
    ==489== Invalid read of size 8
    ==489==    at 0x8A170FD: parse_args_mark (gumbo.c:308)
    ==489==    by 0x490A58B: gc_mark_stacked_objects (gc.c:4737)
    ==489==    by 0x490A58B: gc_mark_stacked_objects_all (gc.c:4777)
    ==489==    by 0x490A58B: gc_marks_rest (gc.c:5653)
    ==489==    by 0x490AE9F: gc_marks (gc.c:5713)
    ==489==    by 0x490AE9F: gc_start (gc.c:6493)
    ==489==    by 0x490AE9F: gc_start (gc.c:6417)
    ==489==    by 0x490B5E2: garbage_collect (gc.c:6413)
    ==489==    by 0x490B5E2: gc_start_internal (gc.c:6722)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4CEED: vm_exec_core (insns.def:915)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A58F70: invoke_block (vm.c:979)
    ==489==    by 0x4A58F70: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A58F70: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A58F70: vm_yield (vm.c:1094)
    ==489==    by 0x4A58F70: rb_yield_0 (vm_eval.c:970)
    ==489==    by 0x4A58F70: rb_yield_1 (vm_eval.c:976)
    ==489==    by 0x4A58F70: rb_yield (vm_eval.c:986)
    ==489==    by 0x487477B: rb_ary_each (array.c:1837)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A58F70: invoke_block (vm.c:979)
    ==489==    by 0x4A58F70: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A58F70: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A58F70: vm_yield (vm.c:1094)
    ==489==    by 0x4A58F70: rb_yield_0 (vm_eval.c:970)
    ==489==    by 0x4A58F70: rb_yield_1 (vm_eval.c:976)
    ==489==    by 0x4A58F70: rb_yield (vm_eval.c:986)
    ==489==    by 0x487477B: rb_ary_each (array.c:1837)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A584C4: invoke_block (vm.c:979)
    ==489==    by 0x4A584C4: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A584C4: invoke_block_from_c_bh (vm.c:1049)
    ==489==    by 0x4A584C4: vm_yield_force_blockarg (vm.c:1110)
    ==489==    by 0x4A584C4: rb_yield_force_blockarg (vm_eval.c:1035)
    ==489==    by 0x4878FAB: rb_ary_collect (array.c:2758)
    ==489==    by 0x4A42198: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
    ==489==    by 0x4A42198: vm_call_cfunc (vm_insnhelper.c:1934)
    ==489==    by 0x4A53E8B: vm_call_method_each_type.part.135 (vm_insnhelper.c:2232)
    ==489==    by 0x4A5450A: vm_call_method_each_type (vm_insnhelper.c:2380)
    ==489==    by 0x4A5450A: vm_call_method (vm_insnhelper.c:2384)
    ==489==    by 0x4A5450A: vm_call_method (vm_insnhelper.c:2351)
    ==489==    by 0x4A4D962: vm_exec_core (insns.def:850)
    ==489==    by 0x4A51FDC: vm_exec (vm.c:1778)
    ==489==    by 0x4A52BAA: invoke_block (vm.c:979)
    ==489==    by 0x4A52BAA: invoke_iseq_block_from_c (vm.c:1031)
    ==489==    by 0x4A53889: invoke_block_from_c_proc (vm.c:1124)
    ==489==    by 0x4A53889: vm_invoke_proc (vm.c:1149)
    ==489==    by 0x4984B18: rb_proc_call (proc.c:887)
    ==489==    by 0x48F401C: exec_end_procs_chain (eval_jump.c:108)
    ==489==    by 0x48F401C: rb_exec_end_proc (eval_jump.c:125)
    ==489==    by 0x48F4171: ruby_finalize_0 (eval.c:124)
    ==489==    by 0x48F4497: ruby_cleanup (eval.c:182)
    ==489==    by 0x48F47C4: ruby_run_node (eval.c:303)
    ==489==    by 0x1090EA: main (main.c:42)
    ==489==  Address 0x1ffeffccf0 is on thread 1's stack
    ==489==  304 bytes below stack pointer
    ==489== 
    {
       <insert_a_suppression_name_here>
       Memcheck:Addr8
       fun:parse_args_mark
       fun:gc_mark_stacked_objects
       fun:gc_mark_stacked_objects_all
       fun:gc_marks_rest
       fun:gc_marks
       fun:gc_start
       fun:gc_start
       fun:garbage_collect
       fun:gc_start_internal
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield
       fun:rb_yield_0
       fun:rb_yield_1
       fun:rb_yield
       fun:rb_ary_each
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield
       fun:rb_yield_0
       fun:rb_yield_1
       fun:rb_yield
       fun:rb_ary_each
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_bh
       fun:vm_yield_force_blockarg
       fun:rb_yield_force_blockarg
       fun:rb_ary_collect
       fun:vm_call_cfunc_with_frame
       fun:vm_call_cfunc
       fun:vm_call_method_each_type.part.135
       fun:vm_call_method_each_type
       fun:vm_call_method
       fun:vm_call_method
       fun:vm_exec_core
       fun:vm_exec
       fun:invoke_block
       fun:invoke_iseq_block_from_c
       fun:invoke_block_from_c_proc
       fun:vm_invoke_proc
       fun:rb_proc_call
       fun:exec_end_procs_chain
       fun:rb_exec_end_proc
       fun:ruby_finalize_0
       fun:ruby_cleanup
       fun:ruby_run_node
       fun:main
    }
    
    topic/gumbo topic/memory 
    opened by flavorjones 1
  • Generate CI container images from Actions

    Generate CI container images from Actions

    Currently many of the images used by CI are generated manually (by me on my development machine) via rake docker. They are stored in Dockerhub.

    What I'd like to have is:

    • [ ] generate OCI images from Github Actions
    • [ ] push to ghcr.io and stop pushing to Dockerhub
    • [ ] update CI pipelines to pull images from ghcr.io
    • [ ] do something to shutdown or suspend the dockerhub project/repo

    I've already done this work on a small scale for https://github.com/flavorjones/calendar-assistant (see https://github.com/flavorjones/calendar-assistant/blob/495df704834c2490fbfef2a423d167609d7f8280/.github/workflows/build-test-image.yml), so this should be just work (and not exploration).

    topic/ci 
    opened by flavorjones 0
  • RFC: Pattern Matching

    RFC: Pattern Matching

    Pattern Matching

    If Nokogiri implemented a pattern matching interface for Nokogiri::XML::Node and Nokogiri::XML::NodeSet, what should it look like?

    If you have some code you're using to deconstruct Nokogiri objects, please post a comment pointing us to what you've done, and explain a bit about your use case and why that approach is valuable, if you can.

    If you have use cases for which XPath or CSS queries are awkward or slow, please post a comment explaining what you'd like to be able to do.

    Resources

    @baweaver has been very thoughtful on the topic of general pattern matching support, he's written Pattern Matching Interfaces in Ruby - Google Docs to start a conversation around basic principles of what "good" looks like in a PM interface

    He's also suggested something like this at https://twitter.com/keystonelemur/status/1357424192904454145:

    # Pattern to match a link: <a/>  
    node in { name: 'a' }
    
    # Pattern for a link with a class: <a class="foo"/>
    node in { name: 'a', class: 'foo' }
    
    # Pattern to select that link: <a class="foo">{.}</a>
    node => { name: 'a', class: 'foo', children: }
    
    # Pattern to select the target URL: <a class="foo" href="{.}"/>
    node => { name: 'a', href: }
    
    # Pattern for multiple links: <a class="foo" href="{.}"/>+
    nodes => [*, { name: 'a', href: => link_one }, { name: 'a', href: => link_two } *]
    
    meta/discussion topic/rfc 
    opened by flavorjones 1
  • [help] Nokogiri::XML::Reader does not recover from parsing errors

    [help] Nokogiri::XML::Reader does not recover from parsing errors

    What problem are you trying to solve?

    We are parsing an XML file containing financial data. The vendor producing this file does not appropriately escape special characters such as &.

    We use the Nokogiri::XML::Reader.from_io(file) invocation to stream parsing of these files. When we try to parse a file that has un-escaped special characters, the reader aborts with the error FATAL: EntityRef: expecting ';'

    I was expecting that the Nokogiri::XML::ParseOptions::RECOVER option would help us recover from this error. And it does seem to help when invoking an in memory XML parse, but it does not help while using the Reader. Is there a way Nokogiri can help us recover from these errors while also giving us the streaming parsing?

    #! /usr/bin/env ruby
    
    require 'nokogiri'
    
    offending_content = '<content>Chapter 1 & 2</content>'
    
    # Errors!
    begin
      Nokogiri.XML(offending_content, nil, 'UTF-8', Nokogiri::XML::ParseOptions::STRICT)
    rescue Nokogiri::XML::SyntaxError
      puts 'syntax error!'
    end
    
    # Recovers.
    Nokogiri.XML(offending_content, nil, 'UTF-8', Nokogiri::XML::ParseOptions::RECOVER)
    puts 'recovered!'
    # []
    
    # I would think that this could recover, but it does not
    begin
      Nokogiri::XML::Reader.from_io(StringIO.new(offending_content), nil, 'UTF-8', Nokogiri::XML::ParseOptions::RECOVER).each {}
    rescue Nokogiri::XML::SyntaxError
      puts 'syntax error from io!'
    end 
    

    This produces

    syntax error!
    recovered!
    syntax error from io!
    

    Environment

    $ bundle exec nokogiri -v
    # Nokogiri (1.11.3)
        ---
        warnings: []
        nokogiri:
          version: 1.11.3
          cppflags:
          - "-I/Users/mattmcf/.rbenv/versions/2.5.8/lib/ruby/gems/2.5.0/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
          - "-I/Users/mattmcf/.rbenv/versions/2.5.8/lib/ruby/gems/2.5.0/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include"
          - "-I/Users/mattmcf/.rbenv/versions/2.5.8/lib/ruby/gems/2.5.0/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri/include/libxml2"
          ldflags: []
        ruby:
          version: 2.5.8
          platform: x86_64-darwin20
          gem_platform: x86_64-darwin-20
          description: ruby 2.5.8p224 (2020-03-31 revision 67882) [x86_64-darwin20]
          engine: ruby
        libxml:
          source: packaged
          precompiled: true
          patches:
          - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
          - 0002-Remove-script-macro-support.patch
          - 0003-Update-entities-to-remove-handling-of-ssi.patch
          - 0004-libxml2.la-is-in-top_builddir.patch
          - 0005-Fix-infinite-loop-in-xmlStringLenDecodeEntities.patch
          - 0006-htmlParseComment-treat-as-if-it-closed-the-comment.patch
          - 0007-use-new-htmlParseLookupCommentEnd-to-find-comment-en.patch
          - '0008-use-glibc-strlen.patch'
          - '0009-avoid-isnan-isinf.patch'
          - 0010-parser.c-shrink-the-input-buffer-when-appropriate.patch
          - 0011-update-automake-files-for-arm64.patch
          libxml2_path: "/Users/mattmcf/.rbenv/versions/2.5.8/lib/ruby/gems/2.5.0/gems/nokogiri-1.11.3-x86_64-darwin/ext/nokogiri"
          iconv_enabled: true
          compiled: 2.9.10
          loaded: 2.9.10
        libxslt:
          source: packaged
          precompiled: true
          patches:
          - 0001-update-automake-files-for-arm64.patch
          compiled: 1.1.34
          loaded: 1.1.34
        other_libraries:
          zlib: 1.2.11
          libiconv: '1.15'
    
    needs/research topic/error-handling 
    opened by mattmcf 7
  • [feature request] HTML5 parser for JRuby implementation

    [feature request] HTML5 parser for JRuby implementation

    This issue is a placeholder for collaboration with the JRuby community to find a way to provide HTML5-compliant parsing for Nokogiri's JRuby implementation.

    #2204 provides an HTML5 parser for the CRuby implementation by leveraging the Gumbo parser, implemented in C, and a C extension that is tightly coupled to libxml2. As a result, the Nokogiri::HTML5 module will not be immediately available on JRuby, which uses Xerces in place of libxml2.

    The Nokogiri maintainers feel it is important to think about and we hope to work on this in the future. If you're interested in helping with HTML5 support on JRuby, please comment on this issue or ping the maintainers on the mailing list or the Discord channel.

    help wanted platform/jruby 
    opened by flavorjones 1
  • JRuby XML::Reader memory performance is poor

    JRuby XML::Reader memory performance is poor

    Hi,

    In the context of a Rails application, I have to process huge XML documents that are "flat". I mean, they could just have been CSV documents instead of XML, but the source provides only XML.

    While it appears to work well in MRI, with jruby the memory consumption is very high, and at some point the process is stuck (out of memory).

    The following stupid script mimics the problem I face:

    p = Pathname.new('big.xml')
    n = 10_000_000
    ping = -> (msg) { puts "#{Time.now}: #{msg}" }
    
    p.open('w') { |f|
        f.puts "<foos>"
        n.times{ f.puts "  <foo>Hello World</foo>" }
        f.puts "</foos>"
    }
    
    ping['before']
    c = 0
    Nokogiri::XML.Reader(p.open).each do |node|
        ping[c] if c % 1_000_000 == 0
        c += 1
    end
    ping['after']
    

    The documentation is somewhat ambiguous on how XML::Reader works. It is easy to understand "The Reader parser is good for when you need the speed of a SAX parser, but do not want to write a Document handler." as meaning "this is a SAX parser with a thin interface on top to make it easier than dealing with SAX yourself".

    However the first node return by XML::Reader has the whole document as inner_xml, so I am wondering if XML::Reader is really SAX.

    What we need in a document that looks like

    <foos>
      <foo>...</foo>
      <foo>...</foo>
      <foo>...</foo>
      ...
      <foo>...</foo>
    <foos>
    

    is to iterate just on the entries. What is the recommendation in such a case?

    Thanks a lot for Nokogiri

    help wanted platform/jruby 
    opened by akimd 7
  • explore: optimize `Node#at_css` and `#at_xpath`

    explore: optimize `Node#at_css` and `#at_xpath`

    Currently, #at_css and #at_xpath execute the entire XPath query with multiple results, creates the NodeSet and wraps each result as a Ruby object before discarding all but the first result.

    It should be possible to optimize this, both at the XPath layer and while marshalling results.

    At the XPath layer, let's play with variations of (original-query)[1]

    At the marshalling layer, let's discard the NodeSet and just return the single Ruby object.

    topic/performance 
    opened by flavorjones 0
  • [bug] [jruby] Java ClassCastException exposed; not wrapped in Ruby Exception

    [bug] [jruby] Java ClassCastException exposed; not wrapped in Ruby Exception

    Actual Behavior

    jruby-9.2.7.0 :001 > require 'nokogiri'
     => true 
    jruby-9.2.7.0 :002 > Nokogiri::HTML.parse("<p>").css("p")["class"]
    Traceback (most recent call last):
                <...snip>
            3: from nokogiri.XmlNodeSet$INVOKER$i$slice.call(XmlNodeSet$INVOKER$i$slice.gen)
            2: from nokogiri.XmlNodeSet.slice(XmlNodeSet.java:356)
            1: from nokogiri.XmlNodeSet.rangeBeginLength(XmlNodeSet.java:314)
    Java::JavaLang::ClassCastException (org.jruby.RubyString cannot be cast to org.jruby.RubyRange)
    jruby-9.2.7.0 :003 > 
    

    I'm aware the correct syntax is Nokogiri::HTML.parse("<p>").css("p").first["class"]. This bug report is about an error in the error :-)

    Expected behavior

    If there is an exception, it is a Ruby Exception, not a Java Exception, as shown on MRI:

    2.6.2 :001 > require 'nokogiri'
     => true 
    2.6.2 :002 > Nokogiri::HTML.parse("<p>").css("p")["class"]
    Traceback (most recent call last):
                <...snip>
            2: from (irb):2
            1: from (irb):2:in `[]'
    TypeError (no implicit conversion of String into Integer)
    2.6.2 :003 > 
    
    

    Environment

    # Nokogiri (1.11.2)
        ---
        warnings: []
        nokogiri:
          version: 1.11.2
        ruby:
          version: 2.5.3
          platform: java
          gem_platform: universal-java-1.8
          description: jruby 9.2.7.0 (2.5.3) 2019-04-09 8a269e3 Java HotSpot(TM) 64-Bit Server
            VM 25.201-b09 on 1.8.0_201-b09 +jit [linux-x86_64]
          engine: jruby
          jruby: 9.2.7.0
        other_libraries:
          xerces: Xerces-J 2.12.0
          nekohtml: NekoHTML 1.9.21
    
    platform/jruby 
    opened by byteit101 1
  • Upgrade to libiconv 1.16

    Upgrade to libiconv 1.16

    libiconv v1.16 was released in 2019. Nokogiri v1.8.0..v1.11.2 has been using libiconv v1.15.

    Looking at the Changelog, there doesn't seem to be an urgent reason to upgrade, but we should generally try to stay current.

    Note that libiconv is only used for precompiled libraries on these platforms:

    • macOS ("x86_64-darwin" and "arm64-darwin")
    • Windows ("x86-mingw32" and "x64-mingw32")

    See LICENSE-DEPENDENCIES.md for more information.

    Punchlist:

    • [ ] update the dependencies.yml file
    • [ ] test a Windows native gem built with it
    • [ ] test a macOS native gem built with it

    See b788241 for prior update to v1.15, to see what changes.

    help wanted vendored/iconv 
    opened by flavorjones 0
  • Placeholder: user-facing information for Nokogumbo users who need help upgrading/migrating

    Placeholder: user-facing information for Nokogumbo users who need help upgrading/migrating

    Related to #2204

    This is being left blank until a Nokogiri release with HTML5 is shipped. This issue will be referenced in a Nokogumbo warning message.

    opened by flavorjones 0
Releases(v1.11.7)
  • v1.11.7(Jun 2, 2021)

    1.11.7 / 2021-06-02

    • [CRuby] Backporting an upstream fix to XPath recursion depth limits which impacted some users of complex XPath queries. This issue is present in libxml 2.9.11 and 2.9.12. [#2257]

    Checksums

    SHA256:

    4976a9c9e796527d51dc6c311b9bd93a0233f6a7962a0f569aa5c782461836ef  nokogiri-1.11.7.gem
    9d69f57f6c024d86e358a8aef7a273f574721e48a6b2e1426cca007827325413  nokogiri-1.11.7-java.gem
    6017dee25feb80292b04554cc1bf8a0a2ede3b6c3daeac811902157bbc6a3bdc  nokogiri-1.11.7-x64-mingw32.gem
    38892350c1e695eab9bd77483300d681c32a22714d0e2d04d10a4c343b424bdd  nokogiri-1.11.7-x86-mingw32.gem
    1d15603cd878fa2b710a3ba3028a99d9dd0c14b75711faebf9fb6ff40bac3880  nokogiri-1.11.7-x86-linux.gem
    7ad9741e7a2fee1ffb4a4b2e20b00e87992c9efd969f557ca3b83fb2653b9bfc  nokogiri-1.11.7-x86_64-linux.gem
    c93d66d9413ea7c37d30f95e2c54606fec638e556d454e08124d9a33b7fa82c8  nokogiri-1.11.7-arm64-darwin.gem
    8761d9c7baacb26546869ed56dbc78d3eb3cabf49b85d91b1cd827cd6e94fb25  nokogiri-1.11.7-x86_64-darwin.gem
    
    Source code(tar.gz)
    Source code(zip)
  • v1.11.6(May 26, 2021)

    1.11.6 / 2021-05-26

    Fixed

    • [CRuby] DocumentFragment#path now does proper error-checking to handle behavior introduced in libxml > 2.9.10. In v1.11.4 and v1.11.5, calling DocumentFragment#path could result in a segfault.
    Source code(tar.gz)
    Source code(zip)
  • v1.11.5(May 19, 2021)

    1.11.5 / 2021-05-19

    Fixed

    [Windows CRuby] Work around segfault at process exit on Windows when using libxml2 system DLLs.

    libxml 2.9.12 introduced new behavior to avoid memory leaks when unloading libxml2 shared libraries (see libxml/!66). Early testing caught this segfault on non-Windows platforms (see #2059 and [email protected]) but it was incompletely fixed and is still an issue on Windows platforms that are using system DLLs.

    We work around this by configuring libxml2 in this situation to use its default memory management functions. Note that if Nokogiri is not on Windows, or is not using shared system libraries, it will will continue to configure libxml2 to use Ruby's memory management functions. Nokogiri::VERSION_INFO["libxml"]["memory_management"] will allow you to verify when the default memory management functions are being used. [#2241]

    Added

    Nokogiri::VERSION_INFO["libxml"] now contains the key "memory_management" to declare whether libxml2 is using its default memory management functions, or whether it uses the memory management functions from ruby. See above for more details.

    Source code(tar.gz)
    Source code(zip)
  • v1.11.4(May 14, 2021)

    1.11.4 / 2021-05-14

    Security

    [CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:

    Note that two additional CVEs were addressed upstream but are not relevant to this release. CVE-2021-3516 via xmllint is not present in Nokogiri, and CVE-2020-7595 has been patched in Nokogiri since v1.10.8 (see #1992).

    Please see nokogiri/GHSA-7rrm-v45f-jp64 or #2233 for a more complete analysis of these CVEs and patches.

    Dependencies

    • [CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)
    Source code(tar.gz)
    Source code(zip)
  • v1.11.3(Apr 7, 2021)

    1.11.3 / 2021-04-07

    Fixed

    • [CRuby] Passing non-Node objects to Document#root= now raises an ArgumentError exception. Previously this likely segfaulted. [#1900]
    • [JRuby] Passing non-Node objects to Document#root= now raises an ArgumentError exception. Previously this raised a TypeError exception.
    • [CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)
    Source code(tar.gz)
    Source code(zip)
  • v1.11.2(Mar 11, 2021)

    1.11.2 / 2021-03-11

    Fixed

    • [CRuby] NodeSet may now safely contain Node objects from multiple documents. Previously the GC lifecycle of the parent Document objects could lead to nodes being GCed while still in scope. [#1952]
    • [CRuby] Patch libxml2 to avoid "huge input lookup" errors on large CDATA elements. (See upstream GNOME/libxml2#200 and GNOME/libxml2!100.) [#2132].
    • [CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against nokogiri.so by including LDFLAGS in Nokogiri::VERSION_INFO. [#2167]
    • [CRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was invoked twice on each object.
    • [JRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was not called, which was a problem for subclassing such as done by Loofah.

    Improved

    • Reduce the number of object allocations needed when parsing an HTML::DocumentFragment. [#2087] (Thanks, @ashmaroli!)
    • [JRuby] Update the algorithm used to calculate Node#line to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [#1223, #2177]
    • Introduce --enable-system-libraries and --disable-system-libraries flags to extconf.rb. These flags provide the same functionality as --use-system-libraries and the NOKOGIRI_USE_SYSTEM_LIBRARIES environment variable, but are more idiomatic. [#2193] (Thanks, @eregon!)
    • [TruffleRuby] --disable-static is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [#2191, #2193] (Thanks, @eregon!)

    Changed

    • Nokogiri::XML::Path is now a Module (previously it has been a Class). It has been acting solely as a Module since v1.0.0. See 8461c74.
    Source code(tar.gz)
    Source code(zip)
  • v1.11.1(Jan 6, 2021)

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    SHA-256 Checksums of published gems

    a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c  nokogiri-1.11.1-arm64-darwin.gem
    d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d  nokogiri-1.11.1-java.gem
    f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc  nokogiri-1.11.1-x64-mingw32.gem
    dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f  nokogiri-1.11.1-x86-linux.gem
    7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046  nokogiri-1.11.1-x86-mingw32.gem
    5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7  nokogiri-1.11.1-x86_64-darwin.gem
    c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2  nokogiri-1.11.1-x86_64-linux.gem
    42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa  nokogiri-1.11.1.gem
    
    Source code(tar.gz)
    Source code(zip)
  • v1.11.0(Jan 3, 2021)

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
    • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

    Security

    See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

    Added

    • Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]
    • Add support for CSS queries a:has(> b), a:has(~ b), and a:has(+ b). [#688] (Thanks, @jonathanhefner!)
    • Add Node#value? to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!)
    • [CRuby] Add Nokogiri::XML::Node#line= for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!)
    • nokogiri.gemspec is back after a 10-year hiatus. We still prefer you use the official releases, but master is pretty stable these days, and YOLO.

    Performance

    • [CRuby] The CSS ~= operator and class selector . are about 2x faster. [#2137, #2135]
    • [CRuby] Patch libxml2 to call strlen from xmlStrlen rather than the naive implementation, because strlen is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!)
    • Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
    • Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
    • Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
    • Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
    • [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)
    • [CRuby] RelaxNG.from_document no longer leaks memory. [#2114]

    Improved

    • [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
    • {HTML,XML}::Document#parse now accept Pathname objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because the read method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!)
    • [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
    • Add frozen_string_literal: true magic comment to all lib files. [#1745] (Thanks, @oniofchaos!)
    • [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)

    Fixed

    • HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130]
    • The CSS ~= operator now correctly handles non-space whitespace in the class attribute. commit e45dedd
    • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
    • The Node methods add_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [nokogumbo#160]
    • [JRuby] XML::Schema XSD validation errors are captured in XML::Schema#errors. These errors were previously ignored.
    • [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
    • [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
    • [JRuby] Clarify exception message when custom XPath functions can't be resolved.
    • [JRuby] Comparison of Node to Document with Node#<=> now matches CRuby/libxml2 behavior.
    • [CRuby] Syntax errors are now correctly captured in Document#errors for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.
    • [CRuby] Fixed installation on AIX with respect to vasprintf. [#1908]
    • [CRuby] On some platforms, avoid symbol name collision with glibc's canonicalize. [#2105]
    • [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
    • [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
    • [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)

    Removed

    • The internal method Nokogiri::CSS::Parser.cache_on= has been removed. Use .set_cache if you need to muck with the cache internals.
    • The class method Nokogiri::CSS::Parser.parse has been removed. This was originally deprecated in 2009 in 13db61b. Use Nokogiri::CSS.parse instead.

    Changed

    XML::Schema input is now "untrusted" by default

    Address CVE-2020-26247.

    In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.

    This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.

    Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".

    More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.

    HTML parser now obeys the strict or norecover parsing option

    (Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.

    If you're using the default parser options, you will be unaffected by this fix. If you're passing strict or norecover to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.

    Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.

    VersionInfo, the output of nokogiri -v, and related constants

    This release changes the metadata provided in Nokogiri::VersionInfo which also affects the output of nokogiri -v. Some related constants have also been changed. If you're using VersionInfo programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.

    SHA-256 Checksums of published gems

    17ed2567bf76319075b4a6a7258d1a4c9e2661fca933b03e037d79ae2b9910d0  nokogiri-1.11.0.gem
    2f0149c735b0672c49171b18467ce25fd323a8e608c9e6b76e2b2fa28e7f66ee  nokogiri-1.11.0-java.gem
    2f249be8cc705f9e899c07225fcbe18f4f7dea220a59eb5fa82461979991082e  nokogiri-1.11.0-x64-mingw32.gem
    9e219401dc3f93abf09166d12ed99c8310fcaf8c56a99d64ff93d8b5f0604e91  nokogiri-1.11.0-x86-mingw32.gem
    bda2a9c9debf51da7011830c7f2dc5771c122ebcf0fc2dd2c4ba4fc95b5c38f2  nokogiri-1.11.0-x86-linux.gem
    d500c3202e2514b32f4b02049d9193aa825ae3e9442c9cad2d235446c3e17d8d  nokogiri-1.11.0-x86_64-linux.gem
    3a613188e3b76d593b04e0ddcc46f44c288b13f80b32ce83957356f50e22f9ee  nokogiri-1.11.0-arm64-darwin.gem
    b8f9b826d09494b20b30ecd048f5eb2827dccd85b77abeb8baf1f610e5ed28ed  nokogiri-1.11.0-x86_64-darwin.gem
    
    Source code(tar.gz)
    Source code(zip)
  • v1.11.0.rc4(Dec 29, 2020)

    v1.11.0.rc4 / 2020-12-29

    Latest is v1.11.0.rc4 (2020-12-29). To try out release candidates, use gem install --prerelease or gem install nokogiri -v1.11.0.rc4

    If you're using bundler, try updating your Gemfile with:

    gem "nokogiri", "~> 1.11.0.rc4"`
    

    Delta since v1.11.0.rc3:

    Notes

    • Added precompiled native gem support for Darwin (OSX) platform arm64-darwin

    Dependencies

    Ruby

    Gems

    • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)

    Security

    See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

    Performance

    • [CRuby] The CSS ~= operator and class selector . are about 2x faster. [#2137, #2135]
    • [CRuby] Patch libxml2 to call strlen from xmlStrlen rather than the naive implementation, because strlen is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!)
    • Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
    • [CRuby] RelaxNG.from_document no longer leaks memory. [#2114]

    Improved

    • [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
    • {HTML,XML}::Document#parse now accept Pathname objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because the read method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!)
    • [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
    • [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)

    Fixed

    • HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130]
    • The CSS ~= operator now correctly handles non-space whitespace in the class attribute. commit e45dedd
    • The Node methods add_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [nokogumbo#160]
    • [JRuby] XML::Schema XSD validation errors are captured in XML::Schema#errors. These errors were previously ignored.
    • [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
    • [JRuby] Clarify exception message when custom XPath functions can't be resolved.
    • [JRuby] Comparison of Node to Document with Node#<=> now matches CRuby/libxml2 behavior.
    • [CRuby] Syntax errors are now correctly captured in Document#errors for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.
    • [CRuby] On some platforms, avoid symbol name collision with glibc's canonicalize. [#2105]
    • [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
    • [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)

    Changed

    XML::Schema input is now "untrusted" by default

    Address CVE-2020-26247.

    In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.

    This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.

    Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".

    More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.

    HTML parser now obeys the strict or norecover parsing option

    (Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.

    If you're using the default parser options, you will be unaffected by this fix. If you're passing strict or norecover to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.

    Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.

    VersionInfo, the output of nokogiri -v, and related constants

    This release changes the metadata provided in Nokogiri::VersionInfo which also affects the output of nokogiri -v. Some related constants have also been changed. If you're using VersionInfo programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.

    Source code(tar.gz)
    Source code(zip)
  • v1.11.0.rc3(Sep 8, 2020)

    v1.11.0.rc3 / 2020-09-08

    To try out release candidates, use gem install --prerelease or gem install nokogiri -v1.11.0.rc3

    If you're using bundler, try updating your Gemfile with:

    gem "nokogiri", "~> 1.11.0.rc3"`
    

    Delta since v1.11.0.rc2:

    Notes

    Added precompiled native gem support for OSX/Darwin platform x86_64-darwin19.

    Fixed

    • [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
    Source code(tar.gz)
    Source code(zip)
  • v1.10.10(Jul 6, 2020)

    1.10.10 / 2020-07-06

    Features

    • [MRI] Cross-built Windows gems now support Ruby 2.7 [#2029]. Note that prior to this release, the v1.11.x prereleases provided this support.
    Source code(tar.gz)
    Source code(zip)
  • v1.11.0.rc2(Apr 1, 2020)

    v1.11.0.rc2 / 2020-04-01

    To try out release candidates, use gem install --prerelease. Latest is v1.11.0.rc2.

    Delta since v1.11.0.rc1:

    Notes

    Note that the linux-native gems for v1.11.0.rc2 and later support musl systems (e.g., alpine).

    Dependencies

    • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

    Added

    • Add Node methods for manipulating keyword attributes (like class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

    Fixed

    • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
    • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]

    Removed

    • The internal method Nokogiri::CSS::Parser.cache_on= has been removed. Use .set_cache if you need to muck with the cache internals.
    • The method Nokogiri::CSS::Parser.parse has been removed. This was originally deprecated in 2009 in 13db61b.
    Source code(tar.gz)
    Source code(zip)
  • v1.10.9(Mar 1, 2020)

    1.10.9 / 2020-03-01

    Fixed

    • [MRI] Raise an exception when Nokogiri detects a specific libxml2 edge case involving blank Schema nodes wrapped by Ruby objects that would cause a segfault. Currently no fix is available upstream, so we're preventing a dangerous operation and informing users to code around it if possible. [#1985, #2001]
    • [JRuby] Change NodeSet#to_a to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (Thanks, @headius!)
    Source code(tar.gz)
    Source code(zip)
  • v1.10.8(Feb 10, 2020)

    1.10.8 / 2020-02-10

    Security

    [MRI] Pulled in upstream patch from libxml that addresses CVE-2020-7595. Full details are available in #1992. Note that this patch is not yet (as of 2020-02-10) in an upstream release of libxml.

    Source code(tar.gz)
    Source code(zip)
  • v1.11.0.rc1(Feb 2, 2020)

    v1.11.0.rc1 / 2020-02-02

    To try out release candidates, use gem install --prerelease.

    Notes

    Experiment: Pre-Compiled Native Linux Gems

    With the v1.11.0 release candidates, we are experimenting with shipping pre-compiled native Linux gems for the x86-linux and x86_64-linux platforms.

    If this works properly for you, it will speed up installation time on Linux.

    If this doesn't work for you, please drop us a note at #1983, we may reach out to you for more information on your distro and configuration.

    Either way, we'd appreciate some feedback at #1983.

    Dependencies

    This release introduces support for:

    • Ruby 2.7, including the precompiled native binary gems for Windows.

    This release ends support for:

    Added

    • Add support for CSS queries "a:has(> b)", "a:has(~ b)", and "a:has(+ b)". [#688] (Thanks, @jonathanhefner!)
    • Add Node#value? to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!)
    • [MRI] Add Nokogiri::XML::Node#line= for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!)

    Improved

    • Add frozen_string_literal: true magic comment to all lib files. [#1745] (Thanks, @oniofchaos!)
    • Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
    • Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
    • Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
    • [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)

    Fixed

    • [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
    • [JRuby] Change NodeSet#to_a to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (Thanks, @headius!)

    Changed

    VersionInfo and the output of nokogiri -v

    This release changes the information provided in Nokogiri::VersionInfo, see #1482 and #1974 for background. Note that the output of nokogiri -v will also reflect these changes.

    Nokogiri::VersionInfo will no longer contain the following keys (previously these were set only when vendored libraries were being used)

    • libxml/libxml2_path
    • libxml/libxslt_path

    Nokogiri::VersionInfo now contains version metadata for libxslt:

    • libxslt/source (either "packaged" or "system", similar to libxml/source)
    • libxslt/compiled (the version of libxslt compiled at installation time, similar to libxml/compiled)
    • libxslt/loaded (the version of libxslt loaded at runtime, similar to libxml/loaded)
    • libxslt/patches moved from libxml/libxslt_patches

    Nokogiri::VersionInfo key libxml/libxml2_patches has been renamed to libxml/patches

    These C macros will no longer be defined:

    • NOKOGIRI_LIBXML2_PATH
    • NOKOGIRI_LIBXSLT_PATH

    These global variables will no longer be defined:

    • NOKOGIRI_LIBXML2_PATH
    • NOKOGIRI_LIBXSLT_PATH

    These constants have been renamed:

    • Nokogiri::LIBXML_VERSION is now Nokogiri::LIBXML_COMPILED_VERSION
    • Nokogiri::LIBXML_PARSER_VERSION is now Nokogiri::LIBXML_LOADED_VERSION

    These methods have been renamed and the return type changed from String to Gem::Version:

    • VersionInfo#loaded_parser_version is now #loaded_libxml_version
    • VersionInfo#compiled_parser_version is now #compiled_libxml_version

    Nokogiri.uses_libxml? now accepts an optional requirement string which is interpreted as a Gem::Requirement and tested against the loaded libxml2 version (the value in VersionInfo key libxml/loaded). This greatly simplifies much of the version-dependent branching logic in both the implementation and the tests.

    To sum these changes up, the output from CRuby when using vendored libraries was something like:

    # Nokogiri (1.10.7)
        ---
        warnings: []
        nokogiri: 1.10.7
        ruby:
          version: 2.7.0
          platform: x86_64-linux
          description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
          engine: ruby
        libxml:
          binding: extension
          source: packaged
          libxml2_path: "/home/flavorjones/.rvm/gems/ruby-2.7.0/gems/nokogiri-1.10.7/ports/x86_64-pc-linux-gnu/libxml2/2.9.10"
          libxslt_path: "/home/flavorjones/.rvm/gems/ruby-2.7.0/gems/nokogiri-1.10.7/ports/x86_64-pc-linux-gnu/libxslt/1.1.34"
          libxml2_patches:
          - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
          - 0002-Remove-script-macro-support.patch
          - 0003-Update-entities-to-remove-handling-of-ssi.patch
          - 0004-libxml2.la-is-in-top_builddir.patch
          libxslt_patches: []
          compiled: 2.9.10
          loaded: 2.9.10
    

    but now looks like:

    # Nokogiri (1.11.0)
        ---
        warnings: []
        nokogiri: 1.11.0
        ruby:
          version: 2.7.0
          platform: x86_64-linux
          description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
          engine: ruby
        libxml:
          source: packaged
          patches:
          - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
          - 0002-Remove-script-macro-support.patch
          - 0003-Update-entities-to-remove-handling-of-ssi.patch
          - 0004-libxml2.la-is-in-top_builddir.patch
          compiled: 2.9.10
          loaded: 2.9.10
        libxslt:
          source: packaged
          patches: []
          compiled: 1.1.34
          loaded: 1.1.34
    

    and the output from using system libraries now looks like:

    # Nokogiri (1.11.0)
        ---
        warnings: []
        nokogiri: 1.11.0
        ruby:
          version: 2.7.0
          platform: x86_64-linux
          description: ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
          engine: ruby
        libxml:
          source: system
          compiled: 2.9.4
          loaded: 2.9.4
        libxslt:
          source: system
          compiled: 1.1.29
          loaded: 1.1.29
    
    Source code(tar.gz)
    Source code(zip)
  • v1.10.7(Dec 4, 2019)

  • v1.10.6(Dec 3, 2019)

  • v1.10.5(Oct 31, 2019)

    1.10.5 / 2019-10-31

    Dependencies

    • [MRI] vendored libxml2 is updated from 2.9.9 to 2.9.10
    • [MRI] vendored libxslt is updated from 1.1.33 to 1.1.34
    Source code(tar.gz)
    Source code(zip)
  • v1.10.0.rc1(Aug 20, 2019)

  • v1.10.4(Aug 11, 2019)

    1.10.4 / 2019-08-11

    Security

    Address CVE-2019-5477 (#1915)

    A command injection vulnerability in Nokogiri v1.10.3 and earlier allows commands to be executed in a subprocess by Ruby's Kernel.open method. Processes are vulnerable only if the undocumented method Nokogiri::CSS::Tokenizer#load_file is being passed untrusted user input.

    This vulnerability appears in code generated by the Rexical gem versions v1.0.6 and earlier. Rexical is used by Nokogiri to generate lexical scanner code for parsing CSS queries. The underlying vulnerability was addressed in Rexical v1.0.7 and Nokogiri upgraded to this version of Rexical in Nokogiri v1.10.4.

    This CVE's public notice is https://github.com/sparklemotion/nokogiri/issues/1915

    Source code(tar.gz)
    Source code(zip)
  • v1.10.3(Apr 22, 2019)

    1.10.3 / 2019-04-22

    Security Notes

    [MRI] Pulled in upstream patch from libxslt that addresses CVE-2019-11068. Full details are available in #1892. Note that this patch is not yet (as of 2019-04-22) in an upstream release of libxslt.

    Source code(tar.gz)
    Source code(zip)
  • v1.10.2(Mar 25, 2019)

    1.10.2 / 2019-03-24

    Security

    • [MRI] Remove support from vendored libxml2 for future script macros. [#1871]
    • [MRI] Remove support from vendored libxml2 for server-side includes within attributes. [#1877]

    Bug fixes

    • [JRuby] Fix node ownership in duplicated documents. [#1060]
    • [JRuby] Rethrow exceptions caught by Java SAX handler. [#1847, #1872] (Thanks, @adjam!)
    Source code(tar.gz)
    Source code(zip)
  • v1.10.1(Jan 13, 2019)

    1.10.1 / 2019-01-13

    Features

    • [MRI] During installation, handle Xcode 10's new library pathOS. [#1801, #1851] (Thanks, @mlj and @deepj!)
    • Avoid unnecessary creation of Procs in many methods. [#1776] (Thanks, @chopraanmol1!)

    Bug fixes

    • CSS selector :has() now correctly matches against any descendant. Previously this selector matched against only direct children). [#350] (Thanks, @Phrogz!)
    • NodeSet#attr now returns nil if it's empty. Previously this raised a NoMethodError.
    • [MRI] XPath errors are no longer suppressed during XSLT::Stylesheet#transform. Previously these errors were suppressed which led to silent failures and a subsequent segfault. [#1802]
    Source code(tar.gz)
    Source code(zip)
  • v1.10.0(Jan 4, 2019)

  • v1.9.1(Dec 18, 2018)

    1.9.1 / 2018-12-17

    Bug fixes

    • Fix a bug introduced in v1.9.0 where XML::DocumentFragment#dup no longer returned an instance of the callee's class, instead always returning an XML::DocumentFragment. This notably broke any subclass of XML::DocumentFragment including HTML::DocumentFragment as well as the Loofah gem's Loofah::HTML::DocumentFragment. [#1846]
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(Dec 17, 2018)

    1.9.0 / 2018-12-17

    Security Notes

    • [JRuby] Upgrade Xerces dependency from 2.11.0 to 2.12.0 to address upstream vulnerability CVE-2012-0881 [#1831] (Thanks @grajagandev for reporting.)

    Notable non-functional changes

    • Decrease installation size by removing many unneeded files (e.g., /test) from the packaged gems. [#1719] (Thanks, @stevecrozz!)

    Features

    • XML::Attr#value= allows HTML node attribute values to be set to either a blank string or an empty boolean attribute. [#1800]
    • Introduce XML::Node#wrap which does what XML::NodeSet#wrap has always done, but for a single node. [#1531] (Thanks, @ethirajsrinivasan!)
    • [MRI] Improve installation experience on macOS High Sierra (Darwin). [#1812, #1813] (Thanks, @gpakosz and @nurse!)
    • [MRI] Node#dup supports copying a node directly to a new document. See the method documentation for details.
    • [MRI] DocumentFragment#dup is now more memory-efficient, avoiding making unnecessary copies. [#1063]
    • [JRuby] NodeSet has been rewritten to improve performance! [#1795]

    Bug fixes

    • NodeSet#each now returns self instead of zero. [#1822] (Thanks, @olehif!)
    • [MRI] Address a memory leak when using XML::Builder to create nodes with namespaces. [#1810]
    • [MRI] Address a memory leak when unparenting a DTD. [#1784] (Thanks, @stevecheckoway!)
    • [MRI] Use RbConfig::CONFIG instead of ::MAKEFILE_CONFIG to fix installations that use Makefile macros. [#1820] (Thanks, @nobu!)
    • [JRuby] Decrease large memory usage when making nested XPath queries. [#1749]
    • [JRuby] Fix failing tests on JRuby 9.2.x
    • [JRuby] Fix default namespaces in nodes reparented into a different document [#1774]
    • [JRuby] Fix support for Java 9. [#1759] (Thanks, @Taywee!)

    Dependencies

    • [MRI] Upgrade mini_portile2 dependency from ~> 2.3.0 to ~> 2.4.0
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0.rc1(Dec 10, 2018)

    1.9.0.rc1 / 2018-12-10

    Security Notes

    • [JRuby] Upgrade Xerces dependency from 2.11.0 to 2.12.0 to address upstream vulnerability CVE-2012-0881 [#1831] (Thanks @grajagandev for reporting.)

    Features

    • XML::Attr#value= allows HTML node attribute values to be set to either a blank string or an empty boolean attribute. [#1800]
    • Introduce XML::Node#wrap which does what XML::NodeSet#wrap has always done, but for a single node. [#1531] (Thanks, @ethirajsrinivasan!)
    • [MRI] Improve installation experience on macOS High Sierra (Darwin). [#1812, #1813] (Thanks, @gpakosz and @nurse!)
    • [MRI] Node#dup supports copying a node directly to a new document. See the method documentation for details.
    • [MRI] DocumentFragment#dup is now more memory-efficient, avoiding making unnecessary copies. [#1063]
    • [JRuby] NodeSet has been rewritten to improve performance! [#1795]

    Bug fixes

    • NodeSet#each now returns self instead of zero. [#1822] (Thanks, @olehif!)
    • [MRI] Address a memory leak when using XML::Builder to create nodes with namespaces. [#1810]
    • [MRI] Address a memory leak when unparenting a DTD. [#1784] (Thanks, @stevecheckoway!)
    • [MRI] Decrease large memory usage when making nested XPath queries. [#1749]
    • [MRI] Use RbConfig::CONFIG instead of ::MAKEFILE_CONFIG to fix installations that use Makefile macros. [#1820] (Thanks, @nobu!)
    • [JRuby] Fix failing tests on JRuby 9.2.x
    • [JRuby] Fix default namespaces in nodes reparented into a different document [#1774]
    • [JRuby] Fix support for Java 9. [#1759] (Thanks, @Taywee!)

    Dependencies

    • [MRI] Upgrade mini_portile2 dependency from ~> 2.3.0 to ~> 2.4.0
    Source code(tar.gz)
    Source code(zip)
jQuery-like cross-driver interface in Java for Selenium WebDriver

seleniumQuery Feature-rich jQuery-like Java interface for Selenium WebDriver seleniumQuery is a feature-rich cross-driver Java library that brings a j

null 71 May 2, 2021
jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

jsoup: Java HTML Parser jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting a

Jonathan Hedley 8.9k Jun 17, 2021
This is public repository for Selenium Learners at TestLeaf

Selenium WebDriver Course for March 2021 Online Learners This is public repository for Selenium Learners at TestLeaf. Week1 - Core Java Basics How Jav

TestLeaf 57 May 30, 2021
A scalable web crawler framework for Java.

Readme in Chinese A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persiste

Yihua Huang 9.8k Jun 14, 2021
Open Source Web Crawler for Java

crawler4j crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thr

Yasser Ganjisaffar 4.1k Jun 8, 2021
A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions

:>>> DEPRECATION NOTE <<<: Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life. The pro

Mathias 1.3k May 14, 2021
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 10.1k Jun 16, 2021
Automated driver management for Selenium WebDriver

WebDriverManager is a library which allows to automate the management of the drivers (e.g. chromedriver, geckodriver, etc.) required by Selenium WebDr

Boni García 1.6k Jun 17, 2021
Concise UI Tests with Java!

Selenide = UI Testing Framework powered by Selenium WebDriver What is Selenide? Selenide is a framework for writing easy-to-read and easy-to-maintain

Selenide 1.3k Jun 9, 2021
An implementation of darcy-web that uses Selenium WebDriver as the automation library backend.

darcy-webdriver An implementation of darcy-ui and darcy-web that uses Selenium WebDriver as the automation library backend. maven <dependency> <gr

darcy framework 20 Aug 22, 2020
Elegant parsing in Java and Scala - lightweight, easy-to-use, powerful.

Please see https://repo1.maven.org/maven2/org/parboiled/ for download access to the artifacts https://github.com/sirthias/parboiled/wiki for all docum

Mathias 1.2k Jun 7, 2021