USPTO Patent Images

Len Lydik
09/28/2004 04:16 pm
Hi. I`m attempting to use Offline Explorer Pro to download a batch of Patents TIFF images from the USPTO web site. I`m finding it difficult to get a URL that I can work with.

A sample URL to an image is: http://patimg1.uspto.gov/.DImg?Docid=US000000035&PageNum=1&IDKey=3C9FF5283B6E&ImgFormat=tif

A sample URL to a patent record is: http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456


Basically, I want to download several thousand patents images from the 1800`s, speed is not an issue as this can take several days if need be.

Any help you can give me would be appreciated. Thanks!
Len Lydik
09/28/2004 04:40 pm
Me again. Firstly, these patents are public domain documents, so this isn`t stealing.

Basically, I would like Offline Explorer Pro to insert the patent numbers into this URL:

http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={PATENT NUMBER HERE}

Where the patent numbers fall within the following series:

X1 - X11,280

and

1 - 3,930,270

and then save the TIFF images from each of these patent pages.

How do I do that?
09/29/2004 07:02 am
You can try these settings if you only want the tif files on your Computer:

-----------------------
URLs:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456
DeleteAfterParsing=.piw*,patft.uspto.gov
(You can delete the "DeleteAfterParsing=..." line if you want to keep the Text files)

File Filters:
Text:
Load using URL filters settings

Images:
Load using URL filters settings
(enable only the tif and tiff extension)

Other:
Load using URL filters settings


URL Filters:

Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg2.uspto.gov

Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg2.uspto.gov/.piw?docid
=pn/
dimg?docid


Advanced:

URL Substitutes...:
URL:
Format=tif

Replace:
Format=tif

With:
Format=tif.tif

(Click on "Add" and *uncheck* the rule; OK
Note: "Format=tif" is case sensitive!)
---------------------


> Basically, I would like Offline Explorer Pro to insert the patent numbers
> into this URL:
> http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={PATENT NUMBER HERE}
> Where the patent numbers fall within the following series:
>
> X1 - X11,280

These are not Patent numbers? Or am I wrong?
I don`t know what you want.
You can use X{:1..11} in the Adress field to represent:
X1
X2
.
.
X11

or use a comma separated list if you have different values, for example:
{:X1,X2,X3,X8,abc}

(And: You can place more than one URL in the Addresses field!)

> 1 - 3,930,270

Are you searching for Patent No. 1930270 - 3930270?
Then you have to place the following Address in the URLs field:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={:13930270..3930270}

You can find more information on this topic in the Help file:
Advanced features... Using URL Macros

HTH
Len Lydik
10/04/2004 01:56 pm
Thank you, HTH! I`ve reconfigured and am running now. So far, no images have been saved.

QUESTION: Did you try these settings and successfully download patent images?


Yes, X1, X2, etc. are patent numbers. They refer to the "X Patents" which constitute the first patents issued before the 1836 fire that destroyed them all.
10/04/2004 04:33 pm
> QUESTION: Did you try these settings and successfully download patent images?

Yes, OEP downloads the images belonging to Patent 5123456 (18 tif files).
To see what`s wrong with your setting:
Mark your project; Click on the "Copy" button; Paste the project setting in a new message.

> Yes, X1, X2, etc. are patent numbers.

OK, thanks for the information.
Len Lydik
10/04/2004 05:24 pm
I don`t know what was wrong, but I created a new project with the specs you provide and it works. Thank you!
10/04/2004 06:21 pm
> I don`t know what was wrong, but I created a new project with the specs you provide and it works.

Great!

> Thank you!

No problem. Best luck with the rest of the download!
Len Lydik
10/14/2004 11:53 am
For some reason, I can`t get OEP to get any images off of http://patimg1.uspto.gov/.

But I have no problem getting them off http://patimg2.uspto.gov/


Help...
10/14/2004 08:36 pm
> For some reason, I can`t get OEP to get any images off of http://patimg1.uspto.gov/.
>
> But I have no problem getting them off http://patimg2.uspto.gov/

This is obvious if you haven`t changed the above mentioned project settings. I didn`t know that the server patimg1.uspto.gov does exist and that you have to download some files from it.

So you have to change the project settings:

------------------
URL Filters:

Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg*.uspto.gov

Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg*.uspto.gov/.piw?docid
=pn/
dimg?docid
------------------

I would recommend to do some other adjustments if you don`t want to download the whole tif files again:
In:

http://patimg2.uspto.gov/.DImg?Docid=US005123456&PageNum=14&IDKey=C21A184E999A&ImgFormat=tif.tif

"&IDKey=C21A184E999A" represents a sort of a Session-ID. If you continue your download, you will get the same tif files with other IDKeys.

To avoid this: Please try the following (without even touching your existing project or downloads):

- In OEP create a new folder (Ctrl+Alt+N)
Folder Name: Patents_2
Folder download directory:
Click on "Enable"
Change the path (create a new *empty* directory; i.e. C:\Download\Patents_2 )

- Copy the directories with the files which you have previously downloaded into the new directory, for example:

Copy the folders:

C:\Download\patft.uspto.govC:\Download\patimg2.uspto.gov
to:

C:\Download\Patents_2\patft.uspto.govC:\Download\Patents_2\patimg2.uspto.gov
And now you have to rename the files in these directories with a Renaming tool in order to delete the &IDKey* part of the files:

It should be a simple task; for example, you could do this with Total Commander (File Manager):
- Go to: C:\Download\Patents_2
- Hit Ctrl+B
- Hit *
- Hit Ctrl+M (Multi-Rename Tool)

Search for:
&IDKey=*&

Replace with:
&

Click on "Start"
Ignore the "duplicates" warning (if you already have duplicates in your folder).
Click on "Close"

Now delete the duplicates:
Click on Alt+F7 (Search)
Search for: IDKey
Start search
Click on "Feed to listbox"
Hit on *
Hit F8 (Delete)
Now the duplicates should be deleted


Create a new project in the "Patent_2" folder with the following settings:

-----

Do not download existing files

URLs:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456
DeleteAfterParsing=.piw*,patft.uspto.gov
(You can delete the "DeleteAfterParsing=..." line if you want to keep the Text files)

File Filters:
Text:
Load using URL filters settings

Images:
Load using URL filters settings
(enable only the tif and tiff extension)

Other:
Load using URL filters settings


URL Filters:

Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg*.uspto.gov

Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg*.uspto.gov/.piw?docid
=pn/
dimg?docid


Advanced:

Substitute

- 1. Substitute -

URL:
&IDKey=*&ImgFormat=tif

Replace:
&IDKey=*&ImgFormat=tif

With:
&ImgFormat=tif.tif

(Click on "Add" and *uncheck* the rule; OK
Note: "Format=tif" is case sensitive!)

- 2. Substitute -

URL:
&IDKey=*&HomeUrl=*

Replace:
&IDKey=**&HomeUrl=*

With:


(Click on "Add" and *uncheck* the rule)
-----

Please check yourself if everything is fine.

*Very important:*
If you want to stop the download *DO NOT* click on the Stop button. Instead use:

Download...
Suspend to file

To continue the download, click on:
Download...
Resume from file

I would recommend to delete the Stop button from the toolbar and place the "Suspend to file" and "Resume from file" button in the toolbar (at least temporary) in order to avoid inadvertent clicking on the Stop button.

HTH
Len Lydik
10/15/2004 12:00 pm
Thanks for your help. Looks like it`s working.
Havajan
03/09/2006 09:15 pm
I am new in the forum but already excited of it.
It contains a lot of things, really usefull for all of us. But a lot of my time I spend on <a href=sanitized_by_modx& #039http://www.online-pokerratings.infosanitized_by_modx& #039>Online Poker</a>and dont know how to stop it...My wife very angry.. :(