hong's website
Some Possible Uses of Wget
Last modified on April 2007 by Hong Ong
wget is a network utility for retrieving files from the Web using http and ftp protocols. wget also supports recursive retrieval of files.
General usage
- If you want to download a URL.
$ wget http://x.y.z/
- If the network connection is slow and the remote file is large, the connection will probably fail before the whole file is retrieved. By default, wget
will try to get the file until either it gets the whole file or it exceeds the default number of retries. To increase the number of retries, says to 50, you can do:
$ wget --tries=50 http://x.y.z/jpg/web.jpg
Note: A short hand for '--tries ' is '-t '. Use '-t inf ' for unlimited number of tries.
- If you want to execute wget
in the background and write its progress to log file 'wget.log
', you can do:
$ wget -t 50 -o wget.log http://x.y.z/dir/file &
- If you want to do anonymous FTP transfers. you can do:
$ wget ftp://x.y.z/file.txt ftp://foo.download.com/file.txt => 'file.txt' Connecting to x.y.z:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD not needed. ==> PASV ... done. ==> RETR file.txt ... done.
Other usage
- If you want to read a list of URLs from a file, you can do:
$ wget -i filename
Note: If you specify '- ' as file name, the URLs will be read from standard input.
- If you want to create a mirror of a web site and saving the log of the activities to 'wget.log
':
$ wget -r http://x.y.z/ -o wget.log
- If you want to retrieve the first layer of a site, you can do:
$ wget -r -l1 http://x.y.z/
- If you want to retrieve the index.html and show the original server headers, you can do:
$ wget -S http://x.y.z/
- If you want to download all the JPG files from an URL, you can do:
$ wget -r -l1 --no-parent -A.jpg http://x.y.z/dir/
The options mean:
- ' -r -l1 ' means to retrieve recursively, with maximum depth of 1.
- ' --no-parent ' means that references to the parent directory are ignored, and
- ' -A.jpg ' means to download only the JPG files. ' -A " *.jpg " ' would have worked too.
Note: Since HTTP retrieval does not support globbing, the command 'wget http://x.y.z/dir/*.gif ' won't work.
- If you do not want to clobber the files that are already present, you can do:
$ wget -nc -r http://x.y.z/
- If you want to encode your own username and password to HTTP or FTP, you can do:
$ wget ftp://name:password@x.y.z/file
- If you wish to automatically keep a mirror of a page (or FTP subdirectories), you can insert the following line into a crontab:
0 0 * * 0 wget --mirror ftp://x.y.z/pub -o /var/wget.log
- 0 0 * * 0 means each Sunday
- '--mirror means '-r -N '
- If you wish to do the same as above but only want all html pages, you can replace it with:
$ wget --mirror -A.html ftp://x.y.z/pub -o /var/wget.log
References
Disclaimer
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Use at your own risk.