Scraping the web is fraught with peril. URLs die; networks get disrupted
and best laid plans for building a corups from links can quickly go awry.
Use this funtion to mitigate some of the pain of retrieving web resoures.
safePOST(url = NULL, config = list(), timeout = httr::timeout(5),
..., body = NULL, encode = c("multipart", "form", "json", "raw"),
handle = NULL)
Arguments
url |
the url of the page to retrieve |
config |
Additional configuration settings such as http
authentication (authenticate ), additional headers
(add_headers ), cookies (set_cookies ) etc.
See config for full details and list of helpers. |
timeout |
a call to httr::timeout() . Default timeout is 5 seconds. |
... |
Further named parameters, such as query , path , etc,
passed on to modify_url . Unnamed parameters will be combined
with config . |
body |
One of the following:
FALSE : No body. This is typically not used with POST ,
PUT , or PATCH , but can be useful if you need to send a
bodyless request (like GET ) with VERB() .
NULL : An empty body
"" : A length 0 body
upload_file("path/") : The contents of a file. The mime
type will be guessed from the extension, or can be supplied explicitly
as the second argument to upload_file()
A character or raw vector: sent as is in body. Use
content_type to tell the server what sort of data
you are sending.
A named list: See details for encode.
|
encode |
If the body is a named list, how should it be encoded? Can be
one of form (application/x-www-form-urlencoded), multipart,
(multipart/form-data), or json (application/json).
For "multipart", list elements can be strings or objects created by
upload_file . For "form", elements are coerced to strings
and escaped, use I() to prevent double-escaping. For "json",
parameters are automatically "unboxed" (i.e. length 1 vectors are
converted to scalars). To preserve a length 1 vector as a vector,
wrap in I() . For "raw", either a character or raw vector. You'll
need to make sure to set the content_type() yourself. |
handle |
The handle to use with this request. If not
supplied, will be retrieved and reused from the handle_pool
based on the scheme, hostname and port of the url. By default httr
requests to the same scheme/host/port combo. This substantially reduces
connection time, and ensures that cookies are maintained over multiple
requests to the same host. See handle_pool for more
details. |
Details
This is a thin wrapper for httr::GET()
using purrr::safely()
that will
either return a httr
response
object or NULL
if there was an error.
If you need the reason for the error (e.g. Could not resolve host...
)
you should write your own wrapper.