Scraping the web is fraught with peril. URLs die; networks get disrupted and best laid plans for building a corups from links can quickly go awry. Use this funtion to mitigate some of the pain of retrieving web resoures.

safeGET(url = NULL, config = list(), timeout = httr::timeout(5), ...,
  handle = NULL)

Arguments

url

the url of the page to retrieve

config

Additional configuration settings such as http authentication (authenticate), additional headers (add_headers), cookies (set_cookies) etc. See config for full details and list of helpers.

timeout

a call to httr::timeout(). Default timeout is 5 seconds.

...

Further named parameters, such as query, path, etc, passed on to modify_url. Unnamed parameters will be combined with config.

handle

The handle to use with this request. If not supplied, will be retrieved and reused from the handle_pool based on the scheme, hostname and port of the url. By default httr requests to the same scheme/host/port combo. This substantially reduces connection time, and ensures that cookies are maintained over multiple requests to the same host. See handle_pool for more details.

Details

This is a thin wrapper for httr::GET() using purrr::safely() that will either return a httr response object or NULL if there was an error. If you need the reason for the error (e.g. Could not resolve host...) you should write your own wrapper.