hong's website
 

Some Possible Uses of Wget

Last modified on April 2007 by Hong Ong

wget is a network utility for retrieving files from the Web using http and ftp protocols. wget also supports recursive retrieval of files.

General usage

  • If you want to download a URL.
    $ wget http://x.y.z/ 
  • If the network connection is slow and the remote file is large, the connection will probably fail before the whole file is retrieved. By default, wget will try to get the file until either it gets the whole file or it exceeds the default number of retries. To increase the number of retries, says to 50, you can do:
    $ wget --tries=50 http://x.y.z/jpg/web.jpg

    Note: A short hand for '--tries ' is '-t '. Use '-t inf ' for unlimited number of tries.

  • If you want to execute wget in the background and write its progress to log file 'wget.log ', you can do:
    $ wget -t 50 -o wget.log http://x.y.z/dir/file &
  • If you want to do anonymous FTP transfers. you can do:
    $ wget ftp://x.y.z/file.txt
    ftp://foo.download.com/file.txt
          => 'file.txt'
    Connecting to x.y.z:21... connected. 
    Logging in as anonymous ...  Logged in! 
    ==> SYST ... done.    ==> PWD ... done.
    ==> TYPE I ... done.  ==> CWD not needed.  
    ==> PASV ... done.    ==> RETR file.txt ... done. 

Other usage

  • If you want to read a list of URLs from a file, you can do:
    $ wget -i filename

    Note: If you specify '- ' as file name, the URLs will be read from standard input.

  • If you want to create a mirror of a web site and saving the log of the activities to 'wget.log ':
    $ wget -r http://x.y.z/ -o wget.log
  • If you want to retrieve the first layer of a site, you can do:
    $ wget -r -l1 http://x.y.z/
  • If you want to retrieve the index.html and show the original server headers, you can do:
    $ wget -S http://x.y.z/
  • If you want to download all the JPG files from an URL, you can do:
    $ wget -r -l1 --no-parent -A.jpg http://x.y.z/dir/

    The options mean:

    • ' -r -l1 ' means to retrieve recursively, with maximum depth of 1.
    • ' --no-parent ' means that references to the parent directory are ignored, and
    • ' -A.jpg ' means to download only the JPG files. ' -A " *.jpg " ' would have worked too.

    Note: Since HTTP retrieval does not support globbing, the command 'wget http://x.y.z/dir/*.gif ' won't work.

  • If you do not want to clobber the files that are already present, you can do:
    $ wget -nc -r http://x.y.z/
  • If you want to encode your own username and password to HTTP or FTP, you can do:
    $ wget ftp://name:password@x.y.z/file
  • If you wish to automatically keep a mirror of a page (or FTP subdirectories), you can insert the following line into a crontab:
    0 0 * * 0 wget --mirror ftp://x.y.z/pub -o /var/wget.log
    • 0 0 * * 0 means each Sunday
    • '--mirror means '-r -N '
  • If you wish to do the same as above but only want all html pages, you can replace it with:
    $ wget --mirror -A.html ftp://x.y.z/pub -o /var/wget.log

Disclaimer

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Use at your own risk.

 

Creative Commons License
Except where otherwise noted, this site is licensed under a Creative Commons Attribution 2.5 License