Pdf files and parsing issues
|Akram||06/01/2013 02:23 pm|
This is a continuation of our previous talk.
I will resume the problem.
With the new version the website is downloaded like a shame with all of the pdf files.
I tried to reduce the files parsed frow the pdf so i played with the url substitute to get all the pdf files from html pages
so the ones that i get from pdf are almost all bad links
1. to improve parsing of pdf :
error of parsing
Parser error 2: Out of memory URL: http://ia700304.us.archive.org/11/items/adakm/atlasda.pdf
Parser error 2: Out of memory URL: http://ia601502.us.archive.org/17/items/WAQ79051/79051.pdf
links don't exist in the pdf downloaded but are created
give this url
2. what does this mean ?
Reget is not supported. URL: http://ia601509.us.archive.org/9/items/waq2801/02_2802.pdf
3. Is it possible to activate .primary only for certain type of files
the project become very big when creating .primary for pdf files
4. when I disable parsing certain type of files does this prevent .primary to be created or not
5. I will send you via email
the project settings and the queue of files downloaded but are requeued
(I did sort them by type)
I Am sorry, I tried to make a small project but couldn't
6. with the my url substitute (filename):
but other url substitute give me errors. I.e. : all the links in the page aaa are bad
this one is good
this one is bad
maybe we get this because of oee not supporting file and directory (same link)
7. Could you add supporting links of file and folder with this idea
links to filename are kept intact
for the directory add .dir or something other
|Oleg Chernavin||06/01/2013 03:21 pm|
|1. I improved it further:
2. It is if a connection was broken half-way, OE tries to resume and not redownload the file from 0. But not all servers support this method.
3. Sorry, no.
4. Yes, primary files should not be created in this case.
5. Perhaps, my todays fixes will deal with that.
6, 7. I need more details and particular examples for that.
|Akram||06/02/2013 07:24 am|
1. and 5. I will try it after.
2. I get it.
but 4. will help me very well :-)
6. and 7. I will try to make a good explanation in few days
Thank you very match.