1.62 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11
This software is a platform-independent site map generator. It crawls a web site starting from a given URL and outputs XML sitemap file that you can use for Google (via Google Webmaster Tools) or other search engines. Site maps are useful for SEO — you can give the search engine hints about what pages it can index at web your site. The site map generator program is published under GNU General Public License.

To run the generator, you do not need a shell access to your web server. The script is implemented as a simple crawler that can run from any computer that has Python installed on it. The crawler only follows local links and skips links to external sites. It will also not follow links marked with rel="nofollow" and will not crawl into directories that are disallowed in the robots.txt file.

The generator will generate sitemap records with the "<lastmod>" dates if your web server returns web pages with the 'Last-Modified' time stamp. If the crawler encounters an error when downloading a page or when parsing it, it will try to continue with another page.

To run the script, you will need Python version 2.5 or higher. (You can download Python from Python's official site.) The script needs no installation, simply copy it to a suitable directory and run it from there.

The script is mainly useful for smaller and medium-sized sites. It only generates a single sitemap file, so it will max out at 50,000 URLs (this is Google's limit for sitemap files). The script's default limit is 1,000 URLs but you can change it with the -m option.

The script's command line syntax is as follows:
Michael Wilcox's avatar
Michael Wilcox committed
     python -options