r1 - 18 Feb 2006 - 05:21:21 - ArneJohannessenYou are here: TWiki >  ICab Web > UnofficialFAQ > CreatingWebArchives

Creating a Web Archive with iCab and accessing it later.

This tutorial is derived from a series of posts to the iCab list mainly by Sander Tekelenburg and Arne Johannessen.

- Load the page you want to be the Web Archive's starting point. This description used: http://cp.c-ij.com/english/photoretouch/index.html Launch it or the URL of your choice.

- Bring up the contextual menu on the page (on text, or empty space not on an image or other object) and select Page-/Download...

- In the dialog that comes up, make the appropriate decisions:

  • Destination: Web Archive (ESSENTIAL from the Desitnation pop-up) If you don't do this, it will not work.
  • Download Folder: whatever you want
  • Download options... : Default: "Get all files on the same server"

If you know something about the architecture of the site you can choose"Get all files in the same path" which will be somewhat faster, friendlier towards the server and create a smaller Web Archive, but may not get you everything you want so stick with "Get all files on same server."

- In general, setting "Max. depth" to 10 would be good to be relatively sure you'll get everything you want. Use 99 for depth. There is no prize for getting it exactly right.

- Set "Max. number of files" to something in the tens of thousands if you want to be relatively sure you'll get everything you want. The current maximim appears to be 9999 files. Use that number. Again, no prize for getting it ezactly right.

- Set "Download limit" to something in the tens of thousands if you want to be relatively sure you'll get everything you want. Unless you have scant HD space, use 99999 Kb since that is 99,999,000 bytes.

Yes, these are high numbers. But [1] assuming you want a complete result and [2] given that there seems to be no way to edit a Web Archive and you'd thus have to try again with higher settings if an initial attempt with conservative settings doesn't give you the result you're looking for, I think it makes sense.

- Next, go to the "File formats" tab and just switch on everything unless you know of something that you do not want.

Neat alternative to the above: The default settings for this dialog are in Preferances-/Downloads-/Constraints. If you're going to do this often it's probably useful to configure that to your liking, i.e., select everything in sight cause there is no prize for selecting less than needed.

Here is what I set under the Constraints button: Get all files on same server Max number of files 9999 (Seems to be a limit at this time.) Max depth 99 Total data size 99999 Kb (Might be a limit, but way big so don't worry now.) Include embedded: Checked all four boxes Exclude file types: Uncheck every box

If you set your Preferences as above then all you need to do when downloading a web page to an Web Archive is to change the Destination pop-up to Web Archive.

Now hit the "Download" button and go water your garden. Maybe more than once smile

iCab will fetch every single item separately, instead of through multiple simultaneous threads, so it will take time. (I have a 600Mb/s line, but on average, due to all those little files, iCab speed was only about 10KB/s.) I suspect iCab takes this slow approach because IIRC it actually changes some of the HTML. For instance, it changes full URLs to relative URLs, to ensure hyperlinks within the Web Archive can work at all. Possibly it also writes HTTP META-EQUIVs to the HTML files, storing a server's Content-Type headers to for instance ensure that the Web Archive provides the proper charset information.

Two other numbers from another user: I had DSL 256K up and down. It took 80 minutes to create the archive. Today my service was upgraded to 3MB (2MB seems ot be max), it took 20 minutes.

Once iCab completes, could take more than an hour (see above), it is time to test the Web Archive.

From the menu bar select: File-/Offline Mode When you are in off-line mode in iCab the telephone handset in the lower left corner of the screen has a slash through it.

Double click on the Web Archive

It should work like a champ;

-- Joe Walters

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
risleynet
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback