A sample URL to an image is: http://patimg1.uspto.gov/.DImg?Docid=US000000035&PageNum=1&IDKey=3C9FF5283B6E&ImgFormat=tif
A sample URL to a patent record is: http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456
Basically, I want to download several thousand patents images from the 1800`s, speed is not an issue as this can take several days if need be.
Any help you can give me would be appreciated. Thanks!
Basically, I would like Offline Explorer Pro to insert the patent numbers into this URL:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={PATENT NUMBER HERE}
Where the patent numbers fall within the following series:
X1 - X11,280
and
1 - 3,930,270
and then save the TIFF images from each of these patent pages.
How do I do that?
-----------------------
URLs:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456
DeleteAfterParsing=.piw*,patft.uspto.gov
(You can delete the "DeleteAfterParsing=..." line if you want to keep the Text files)
File Filters:
Text:
Load using URL filters settings
Images:
Load using URL filters settings
(enable only the tif and tiff extension)
Other:
Load using URL filters settings
URL Filters:
Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg2.uspto.gov
Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg2.uspto.gov/.piw?docid
=pn/
dimg?docid
Advanced:
URL Substitutes...:
URL:
Format=tif
Replace:
Format=tif
With:
Format=tif.tif
(Click on "Add" and *uncheck* the rule; OK
Note: "Format=tif" is case sensitive!)
---------------------
> Basically, I would like Offline Explorer Pro to insert the patent numbers
> into this URL:
> http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={PATENT NUMBER HERE}
> Where the patent numbers fall within the following series:
>
> X1 - X11,280
These are not Patent numbers? Or am I wrong?
I don`t know what you want.
You can use X{:1..11} in the Adress field to represent:
X1
X2
.
.
X11
or use a comma separated list if you have different values, for example:
{:X1,X2,X3,X8,abc}
(And: You can place more than one URL in the Addresses field!)
> 1 - 3,930,270
Are you searching for Patent No. 1930270 - 3930270?
Then you have to place the following Address in the URLs field:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber={:13930270..3930270}
You can find more information on this topic in the Help file:
Advanced features... Using URL Macros
HTH
QUESTION: Did you try these settings and successfully download patent images?
Yes, X1, X2, etc. are patent numbers. They refer to the "X Patents" which constitute the first patents issued before the 1836 fire that destroyed them all.
Yes, OEP downloads the images belonging to Patent 5123456 (18 tif files).
To see what`s wrong with your setting:
Mark your project; Click on the "Copy" button; Paste the project setting in a new message.
> Yes, X1, X2, etc. are patent numbers.
OK, thanks for the information.
Great!
> Thank you!
No problem. Best luck with the rest of the download!
But I have no problem getting them off http://patimg2.uspto.gov/
Help...
>
> But I have no problem getting them off http://patimg2.uspto.gov/
This is obvious if you haven`t changed the above mentioned project settings. I didn`t know that the server patimg1.uspto.gov does exist and that you have to download some files from it.
So you have to change the project settings:
------------------
URL Filters:
Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg*.uspto.gov
Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg*.uspto.gov/.piw?docid
=pn/
dimg?docid
------------------
I would recommend to do some other adjustments if you don`t want to download the whole tif files again:
In:
http://patimg2.uspto.gov/.DImg?Docid=US005123456&PageNum=14&IDKey=C21A184E999A&ImgFormat=tif.tif
"&IDKey=C21A184E999A" represents a sort of a Session-ID. If you continue your download, you will get the same tif files with other IDKeys.
To avoid this: Please try the following (without even touching your existing project or downloads):
- In OEP create a new folder (Ctrl+Alt+N)
Folder Name: Patents_2
Folder download directory:
Click on "Enable"
Change the path (create a new *empty* directory; i.e. C:\Download\Patents_2 )
- Copy the directories with the files which you have previously downloaded into the new directory, for example:
Copy the folders:
C:\Download\patft.uspto.govC:\Download\patimg2.uspto.gov
to:
C:\Download\Patents_2\patft.uspto.govC:\Download\Patents_2\patimg2.uspto.gov
And now you have to rename the files in these directories with a Renaming tool in order to delete the &IDKey* part of the files:
It should be a simple task; for example, you could do this with Total Commander (File Manager):
- Go to: C:\Download\Patents_2
- Hit Ctrl+B
- Hit *
- Hit Ctrl+M (Multi-Rename Tool)
Search for:
&IDKey=*&
Replace with:
&
Click on "Start"
Ignore the "duplicates" warning (if you already have duplicates in your folder).
Click on "Close"
Now delete the duplicates:
Click on Alt+F7 (Search)
Search for: IDKey
Start search
Click on "Feed to listbox"
Hit on *
Hit F8 (Delete)
Now the duplicates should be deleted
Create a new project in the "Patent_2" folder with the following settings:
-----
Do not download existing files
URLs:
http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5123456
DeleteAfterParsing=.piw*,patft.uspto.gov
(You can delete the "DeleteAfterParsing=..." line if you want to keep the Text files)
File Filters:
Text:
Load using URL filters settings
Images:
Load using URL filters settings
(enable only the tif and tiff extension)
Other:
Load using URL filters settings
URL Filters:
Server:
Custom servers configuration
View included server keywords:
patft.uspto.gov
patimg*.uspto.gov
Filename:
Custom filenames configuration
View included filenames keywords:
http://patimg*.uspto.gov/.piw?docid
=pn/
dimg?docid
Advanced:
Substitute
- 1. Substitute -
URL:
&IDKey=*&ImgFormat=tif
Replace:
&IDKey=*&ImgFormat=tif
With:
&ImgFormat=tif.tif
(Click on "Add" and *uncheck* the rule; OK
Note: "Format=tif" is case sensitive!)
- 2. Substitute -
URL:
&IDKey=*&HomeUrl=*
Replace:
&IDKey=**&HomeUrl=*
With:
(Click on "Add" and *uncheck* the rule)
-----
Please check yourself if everything is fine.
*Very important:*
If you want to stop the download *DO NOT* click on the Stop button. Instead use:
Download...
Suspend to file
To continue the download, click on:
Download...
Resume from file
I would recommend to delete the Stop button from the toolbar and place the "Suspend to file" and "Resume from file" button in the toolbar (at least temporary) in order to avoid inadvertent clicking on the Stop button.
HTH
It contains a lot of things, really usefull for all of us. But a lot of my time I spend on <a href=sanitized_by_modx& #039http://www.online-pokerratings.infosanitized_by_modx& #039>Online Poker</a>and dont know how to stop it...My wife very angry.. :(