[natas] 02>03 & web crawling / robots.txt 사용법

Hacking & Security/Web Hacking

[natas] 02>03 & web crawling / robots.txt 사용법

희디 2022. 5. 20. 05:55

이전 단계에서 얻었던 비밀번호를 입력하자.

이전과 마찬가지로 이 페이지에 아무것도 없다고 뜬다.

이전 단계와 달리 추가된 것은 아래에  주석 처리이다.

일단 script와 link 태그에 있는 소스코드에 뭐라도 있을 거 같아서 들어가봤는데 너무 많아서 포기..

그래서 No more information leaks!! Not even Google will find it this time 내용을 구글에 쳐보니

hackmethod라는 사이트가 있었고 해킹 방법을 알려줬다.

https://hackmethod.com/overthewire-natas-3/?v=06fa567b72d7 여기에 있는 내용을 가져오고 해석을 하면서 이해해보자.

“Site owners have many choices about how Google crawls and indexes their sites through Webmaster Tools and a file called “robots.txt”. With the robots.txt file, site owners can choose not to be crawled by Googlebot, or they can provide more specific instructions about how to process pages on their sites. ”

- 우리말로 바꿔보면,
웹 마스터 툴과 robots.txt라는 파일을 통해 구글이 crawl과 index를 어떻게 하는지 사이트 오너는 많은 선택을 가지고 있다. 이 robots.txt 파일로 인해서 사이트 오너는 googlebot으로 인해 크롤링(추출)되는것으로부터 막을 수 있고, 그들의 사이트에서 페이지 process(처리하다)하는 부분에서 보다 구체적인 지침을 제공할 수 있다고 한다.

=> 이 부분에서 not even google will find it this time의 이유라고 짐작된다.

Lets dig a little deeper. The /robots.txt is a de-facto standard, which means it is not published by any governing body but it is universally accepted. To learn more about this file we can go to http://www.robotstxt.org/robotstxt.html as they describe how to use this file. They suggest putting this file in the top level of the directory, so lets go look there.

- 대충 이해해보면, /robots.txt는 사실상의 표준이며 어떤 기관에서도 게시하지 않지만 보편적으로 허용된다고 한다. 여기서 이 파일을 디렉토리의 최상위 레벨에 두는 것을 제안한다고 한다.

=> /robots.txt에 들어가보자

Looks like we found our “hidden” directory. Inside the directory, we find exactly what we were looking for. Easy peasy. *WARNING* keep in mind that this file will stop honest crawlers (like google) from indexing your website. It will not stop hackers, and they make look for this to crawl specifically.