read_xml.response() method (#242).Fix R CMD check failure
submit_request() now checks for empty form-field-types to select the correct submit fields (@rentrop, #159)
Fixes to follow_link() and back() to correctly manage session history.
If you’re using xml2 1.0.0, html_node() will now return a “missing node”.
Parse rowspans and colspans effectively by filling using repetition from left to right (for colspan) and top to bottom (rowspan) (#111)
Updated a few examples and demos where the website structure has changed.
Made compatible with both xml2 0.1.2 and 1.0.0.
Fix invalid link for SSA example.
Parse <options> that don’t have value attribute (#85).
Remove all remaining uses of html() in favor of read_html() (@jimhester, #113).
rvest has been rewritten to take advantage of the new xml2 package. xml2 provides a fresh binding to libxml2, avoiding many of the work-arounds previously needed for the XML package. Now rvest depends on the xml2 package, so all the xml functions are available, and rvest adds a thin wrapper for html.
A number of functions have change names. The old versions still work, but are deprecated and will be removed in rvest 0.4.0.
html_tag() -> html_name()html() -> read_html()html_node() now throws an error if there are no matches, and a warning if there’s more than one match. I think this should make it more likely to fail clearly when the structure of the page changes.
xml_structure() has been moved to xml2. New html_structure() (also in xml2) highlights id and class attributes (#78).
submit_form() now works with forms that use GET (#66).
submit_request() (and hence submit_form()) is now case-insensitive, and so will find <input type=SUBMIT> as well as<input type="submit">.
submit_request() (and hence submit_form()) recognizes forms with <input type="image"> as a valid form submission button.
html() and xml() pass ... on to httr::GET() so you can more finely control the request (#48).
Add xml support: parse with xml(), then work with using xml_node(), xml_attr(), xml_attrs(), xml_text() and xml_tag() (#24).
xml_structure(): new function that displays the structure (i.e. tag and attribute names) of a xml/html object (#10).
follow_link() now accepts css and xpath selectors. (#38, #41, #42)
html() does a better job of dealing with encodings (passing the problem on to XML::parseHTML()) instead of trying to do it itself (#25, #50).
html_attr() returns default value when input is NULL (#49)
Add missing html_node() method for session.
html_nodes() now returns an empty list if no elements are found (#31).
submit_form() converts relative paths to absolute URLs (#52). It also deals better with 0-length inputs (#29).