A quick summary of some of the data collected in a crawl includes:
- Errors – Client errors such as broken links & server errors (No responses, 4XX, 5XX).
- Blocked URLs – View & audit URLs disallowed by the robots.txt protocol.
- Blocked Resources – View & audit blocked resources in rendering mode.
- External Links – All external links and their status codes.
- Protocol – Whether the URLs are secure (HTTPS) or insecure (HTTP).
- URI Issues – Non ASCII characters, underscores, uppercase characters, parameters, or long URLs.
- Duplicate Pages – Hash value / MD5checksums algorithmic check for exact duplicate pages.
- Page Titles – Missing, duplicate, over 65 characters, short, pixel width truncation, same as h1, or multiple.
- Meta Description – Missing, duplicate, over 156 characters, short, pixel width truncation or multiple.
- Meta Keywords – Mainly for reference, as they are not used by Google, Bing or Yahoo.
- File Size – Size of URLs & images.
- Redirects – Permanent, temporary redirects (3XX responses) & JS redirects.
- Response Time.
Plus heaps more!