OCR text and handwritten forms via http://captricity.com/.
To get the current development version from GitHub:
install.packages("devtools")
devtools::install_github("soodoku/captr")library(captr)Start by getting an application token and setting it using:
# Not a real token
set_token("6dbee39a047c4de2b576b966")Then, create a batch using:
batch <- create_batch("wisc_ads")
batch$idNext, upload image(s) to a batch
path <- system.file("extdata/wisc_ads", package = "captr")
files <- dir(path, full.names = TRUE)
upimage <- lapply(files, upload_image, batch_id = batch$id)
names(upimage[[5]])Once you have created a batch, you need to go online and create a template which tells Captricity what data to pull from where. (Captricity requires a template for each job and it appears that they can only be built online.)
For instance, for this project, the template looked so:
Template
Once you have a template, just go to inbox, and click on process batch and it will bring up potential templates. Pick the template you want and click ok.
Next, check whether the batch is ready to be processed:
tester <- test_readiness(batch_id=batch$id)
tester$errorsYou may also want to find out how much would processing the batch set you back by:
price <- batch_price(batch_id=batch$id)
price$total_user_cost_in_centsNext, submit the batch for processing. At this point, the batch changes to a job.
submit <- submit_batch(batch_id=batch$id)
submit$related_job_idTo track progress of a job, use:
progress <- track_progress(submit$related_job_id)
progress$percent_completedList all forms (instance sets) associated with a job:
list_instances <- list_instance_sets(job_id=submit$related_job_id)
list_instances$idIf you want to download data from a particular form, use the list_instance_sets to get the form (instance_set) id and run:
res1 <- get_instance_set(instance_set_id=list_instances$id[1])
res1$best_estimateGet csv of all your results from a job:
get_all(job_id=submit$related_job_id)Unfortunately, Captricity doesn’t do a particularly good job at getting you the text. For instance, Captricity considers getting text from these fields as ‘impossible’:
Impossible
You can check out the final csv here.