When somebody clicks from another site to your site via a link, the browser usually sends you a referrer header that tells you that they came from an external source. You can see these in your log files. Most analytics software will give you are report on these as well.
Check out linkchecker—it will crawl the site (while obeying ) and generate a report. From there, you can script up a solution for creating the directory tree.robots.txt