Jump to content

Interesting NRHP pdf anomaly


Recommended Posts

In the course of reviewing a Contributing Buildings Waymark today I had occasion to download a Nomination Form pdf to check the WM for accuracy. (Click on the link at your peril - it's 479 pages in length [it later became clear that page 298 was my target]). After waiting the appropriate time usually allotted to download a 479 page pdf I found myself still waiting for some time for it to finish. Seemingly, it didn't. More out of frustration than anything I clicked and dragged across the page that was visible.

Whoooaa!! The outlines of Invisible Text were highlighted, which I copied, such as this:

900a OMB No. 1024-0018
(8-86)
United States Department of the Interior
National Park Service
NATIONAL REGISTER OF HISTORIC PLACES
CONTINUATION SHEET
Section ___ Page __
SUPPLEMENTARY LISTING RECORD
NRIS Reference Number: 06000372 Date Listed: 5/12/2006
Port of San Francisco
Embarcadero Historic District 

 

Meaning that the document had, indeed, downloaded, BUT the text within chose to either become the same colour as the (white) background, OR had otherwise managed to make itself invisible. I believe that I had encountered a similar situation many years ago and even managed to circumvent it, beyond copying and pasting, but can no longer recall further details of that episode. Mebbe it was just a short passage and I copied it. Remember, this is a 479 page document and I had no idea where in the document I needed to go, so a copy and paste wasn't in the cards this time.

 

Has anyone else here experienced a similar episode? Is there a (sensible) workaround?

In the past I have been able to work around non-copyable, yet visible, text by saving a page, editing out the offending script, then reloading it as a local file. This time I haven't checked the source to see if that could be done, suspecting that this was a whole 'nother beast. ... zzz ... ... Mebbe I should do that one of these days... .... zzz ... when I can find the time ... zzz ... ... zzz ...

Keith

Edited by ScroogieII
Link to comment

Seems they are using PDF1.4 and obviously scanned all the document pages without using an OCR software for conversion. So there are a LOT of tiny graphics to display which just take a lot of time to calculate in the browsers own PDF-viewer. So I downloaded the 15MB file and viewed it in the Acrobat-Software - and the information was visible nearly at once.

Link to comment
27 minutes ago, FamilieFrohne said:

Seems they are using PDF1.4 and obviously scanned all the document pages without using an OCR software for conversion. So there are a LOT of tiny graphics to display which just take a lot of time to calculate in the browsers own PDF-viewer. So I downloaded the 15MB file and viewed it in the Acrobat-Software - and the information was visible nearly at once.

 

And, of course, that's another way.

So, Erik, when you say that they "take a lot of time to calculate in the browser'[sic]s own PDF-viewer", does that mean that the browser builders decided to not include that capability OR that I could just ... zzz... go to bed ...zzz... and get up in the morning ...zzz...zzz... and find it has been calculated and presented on my screen ...zzz ...?

 

EDIT: Hang on - it was visible nearly at once in the Acrobat reader BUT it takes eons to accomplish the same task in a web browser? Why the discrepancy?

EDIT: Did I just ask the same question twice? I guess the cat's right. She's been telling me for hours that it's time to go to bed. Peculiar. You're getting up as I'm going to bed. ... zzz ... zzz ... zzz ... zzz ,,, zzz ... zzz ...

Edited by ScroogieII
Link to comment
3 hours ago, ScroogieII said:

EDIT: Hang on - it was visible nearly at once in the Acrobat reader BUT it takes eons to accomplish the same task in a web browser? Why the discrepancy?

Acrobat is an application optimized to render PDF files fast for the user. So it starts rendering the first page right after it has read the contents of the first page and displays it as soon as it is ready - in the meantime the other pages rendering descriptions are just stored in memory and are only rendered when the user scrolls down or jumps to a specific page - you can say, it is "view on demand".

 

A browser has to use a different approach: it has to read the complete file contents first and then it has to transpose the file content to an internal rendering language before the displaying of the contents starts. For small files this process will be fast enough - but larger files may have problems with that.

 

Also most browsers adhere to the PDF specification 1.7 as this was accepted by the ISO committee in 2008. The document above is PDF in version 1.4, so there might also be some issues with the different versioning. As far as I know the application supports all known PDF versions.

 

Hope this explanation helps a little bit

Yours

 

Erik.

Link to comment

I've not had to download the Nomination Form in question as the Waymarker had provided a copy from the state archives.

That one, though, also had serious quirkiness, as, though the text was visible, it wasn't directly copyable. Only characters, without spaces or formatting of any kind, could be copied. I eventually resorted to Copyfish to extract a quote to show to the Waymarker. (I had to decline the WM as the structure submitted was non contributing.) If that hadn't worked my next option was to download it and pour it into a reader.

 

BTW - results were the same in both Chrome and Firefox.

 

Past experience with readers vs browsers does agree completely with your explanation, however, so THANKS, Eric for that!

Keith

Edited by ScroogieII
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...