hu_read_html.Rd
Use a JavaScript-enabled browser context to read and render HTML from a URL.
hu_read_html( url, emulate = c("best", "chrome", "firefox", "ie"), ret = c("html_document", "text"), js_delay = 2000L, timeout = 30000L, ignore_ssl_errors = TRUE, enable_dnt = FALSE, download_images = FALSE, options = c("RECOVER", "NOERROR", "NOBLANKS") )
url | URL to retrieve |
---|---|
emulate | browser to emulate; one of " |
ret | what to return; if |
js_delay | time (ms) to let loaded javascript to execute; default is 2 seconds (2000 ms) |
timeout | overall timeout (ms); |
ignore_ssl_errors | Should SSL/TLS errors be ignored. The default ( |
enable_dnt | Enable the "Do Not Track" header. Default: |
download_images | Download images as the page is loaded? Since this
function is a high-level wrapper designed to do a read of HTML,
it is recommended that you leave this the default |
options | options to pass to |
an xml2
html_document
/xml_document
if ret
== html_document
else
the HTML document text generated by HtmlUnit
.
For the code in the examples, this is the site that is being scraped:
Note that it has a table of values but it is rendered via JavaScript.
if (FALSE) { test_url <- "https://hrbrmstr.github.io/htmlunitjars/index.html" hu_read_html(test_url) }