Copy Link
Add to Bookmark
Report

2 - Abusing the internet with popular search engine technologies

This text introduces the reader to the potential use of Google when attacking a network. Internet search engine Spiders have been partly responsible for a multitude of very successful penetrations to otherwise, reasonably secure sites and networks. By using search engines it is possible to gather data from web servers that could be seen as sensitive.

eZine's profile picture
Published in 
open security
 · 22 Dec 2022

by c0ntex | c0ntexb[at]gmail.com

When you think of a search engine, you probably conjure up an image of the handy, html based site that can help you find those RFC's, wiring diagrams, soft-warez, serial keys or questionable images, depending on your taste. You've probably used generic search engines like Google, Yahoo, MSN, Lycos & Hotbot a trillion times without considering even the possibility that it could be used to support an attack attempt.

Due to the sheer volume of HTTP based attacks, one has to consider than anything utilising HTTP is a serious theoretical attack vector. Too many people feel happy in the knowledge that only HTTP holes are punched in the firewall, if the server daemon is patched and user input locations are verified as safe then there is no need to worry. This is a serious misconception that one should avoid.

HTTP fingerprinting, SQL Injection, malformed header injections, XXS, cookie fun, every conceivable form of web based application or service attack can be managed by your security infrastructure. Yet what is the use in having any of this preventative security to protect your data from prying eyes, when you have allowed search bots to come in happily and crawl all web servers.

Taken from the Berkely teaching guide, stating that Spiders are:

"Computer robot programs, referred to sometimes as "crawlers" or"knowledge-bots" or "knowbots" that are used by search engines to roam the World Wide Web via the Internet, visit sites and databases, and keep the search engine database of web pages up to date. They obtain new pages, update known pages, and delete obsolete ones. Their findings are then integrated into the "home" database".

When any powerful technology is utilised by a curious mind, possessing advanced knowledge of the internal workings, it becomes easy to transform the technology into a potentially malicious weapon. In this case, the engine soon becomes an advanced data-gathering tool for attackers. Everyone knows about RFP's web mining tool called Whisker, yet not everyone is aware that a search engine can do the same...and more.

Google is the most powerful search engine online and it probably has been for the last four years or more, crawling literally millions of websites and public archives daily. The amount of data that it contains would crush any reasonable human mind due to the sheer scale. When data is being stored on such a vast scale and the manner in which it's gathered by engines, it is of no wonder that sometimes-sensitive information will be included.

The spiders that Google use to harvest all these sites have no mind, they are merely an axon connection to a central soma, acting like a neural network. The data is filtered down the axon if it passes a simple logic test, the soma then sends it into the central brain. The only energy required to sustain crawling is a diet similar to that of the pecking pigeons which perform the ranking inside the database clusters.

http://www.google.com/technology/pigeonrank.html

When testing a company site or domain, it is always useful to use Google as it's crawling software is superb and will find some pretty interesting stuff you might otherwise overlook. It also has a very nice cache function that will allow you to see "old" pages that were once available from that site. Attackers can use this feature to compare pages, documents, structures and the likes.

Some fictional public BBS sits in your DMZ, using flat text and CGI. The BBS contains a plain text or md5 encrypted password file that sits in the web directory and Mr Security forgets about it. One week later, a search engine spider creeps along, crawls your site and just happens to find the password text. 1 month later we come along and search your domain for password.txt and guess what pops up.

Interestingly, it seems a user from the internal network has subscribed to the BBS, using the same password as his domain password. Now things become more interesting, especially when you find out after a very authentic sounding phone call to his office receptionist, that the user is a member of IT staff.

        IT: domains, routers, switches, servers, databases. 
Human: laziness, imperfection, same password everywhere, domain admin account?.

It's possible to block spiders by using the robots file, providing the spider can read. An example robots.txt file looks like:

	User-agent: *   # This shoo's all spiders off with a big 
Disallow: / # stick, no flies around this server:


User-agent: googlebot # Fend off googlebot:
Disallow: /cgi-bin/
Disallow: /images/


# Tease vulnerable spiders:
User-agent: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA x 350
Disallow: /GoogleBot PAYLOAD :-)/

If you really suffer from arachnophobia, you could block the crawler bot parasite at your router / firewall.

NOTE: This text is not trying to encourage the use of Google as an attack tool, acting on information found could be illegal. Yet it is worth considering the ramifications of what data you allow your users to place in their private web directories.

Example usage of the Google advanced search features:

	allinurl:			# Find all strings in the url 
allintitle: # Find all strings in the title
filetype: # Find only specific filetype
intitle: # Find any strings in the title
inurl: # Find any strings in the url
link: # Find strings in link
site: # Find strings on this site

Specific:

	allinurl:session_login.cgi			# Nagios/NetSaint/Nagmin 
allinurl:/cgi-bin/status.cgi? # Network monioring
allinurl:/cgi-bin/extinfo.cgi # Nagios/NetSaint

inurl:cpqlogin.htm # Compaq Insight Agent
allinurl:"Index of /WEBAGENT/" # Compaq Insight Agent
allinurl:/proxy/ssllogin # Compaq Insight Agent
allintitle:WBEM Login # Compaq Insight Agent
WBEM site:domain.com # Compaq Insight Agent at domain.com

inurl:vslogin OR vsloginpage # Visitor System or Vitalnet

allintitle:/exchange/root.asp # Exchange Webmail
"Index of /exchange/" # Exchange Webmail Directory

netopia intitle:192.168 # Netopia Router Config

General:

	site:blah.com filename:cgi			# Check cgi source code 

allinurl:.gov passwd # A blackhat favorite
allinurl:.mil passwd # Another blackhat favorite

"Index of /admin/"
"Index of /cgi-bin/"
"Index of /mail/"
"Index of /passwd/"
"Index of /private/"
"Index of /proxy/"
"Index of /pls/"
"Index of /scripts/"
"Index of /tsc/"
"Index of /www/"

"Index of" config.php OR config.cgi

intitle:"index.of /ftp/etc" passwd OR pass -freebsd -netbsd -openbsd
intitle:"index.of" passwd -freebsd -netbsd -openbsd

inurl:"Index of /backup"
inurl:"Index of /tmp"
inurl:auth filetype:mdb # MDB files
inurl:users filetype:mdb
inurl:config filetype:mdb

inurl:clients filetype:xls # General spreadsheets

inurl:network filetype:vsd # Network diagrams

EOF

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT