I worked today with the scans given to me at Bromley Library.
My workflow involves a program called Paperpile. I can take any pdf, and upload it, adding full bibliographic fields. It can act as an extension in Chrome, for quick adding, and I can add websites too from the toolbar. The bibliography of selected items will export in Google Docs. Everything I upload is also stored in Google Docs, so there’s lots of storage. On pdfs where Paperpile can understand the printed lines, I can highlight. On all documents, I can add comment bubbles in various colors.
The main page looks like a list. I can color code and tag.
To stitch together the pdfs (the library had scanned each page separately) I used a website called PDFmergy. It worked great.
This all sounds great, but some of the scanned pdfs are light and hard to read. What Paperpile doesn’t do is OCR, and when it doesn’t have straight lines on the scan, you cannot highlight. So to pull quotes for my thought bubbles, I need an OCR’d version. Turns out if I open the file I have in Google Docs, it does its damnedest, and it does pretty well.
So now the question. All of these things by Wells have been published, in various journals long ago, but none are available online. At Bromley, I had to sign a form saying I wouldn’t publish anything unpublished. But can I make available to all the OCR’d text of these documents? It’s easy to do technologically.