OSINT Methodology

During our penetration tests, we use OSINT to understand our target company better. The better we understand our target, the easier it will be to adapt our attacks to it, especially when it comes to Social Engineering. Suppose we were to go ahead and scan our customer's network. It is common for well-secured networks to have integrated IDS/IPS that would immediately ban us for several hours, if not days, based on our scan. Triggering this type of alarm right at the beginning does not show the professionalism of our activities and even contradicts it.

To perform OSINT efficiently, we need a structure that shows us the essential aspects and dependencies of the information resources and core information. The following diagram shows the critical core elements required and can, of course, contain additional components.

Here we distinguish between the Core Elements we need and the Information Resources from which we can extract the corresponding core information.

Core Elements are pieces of information that give us a better picture of the company and its infrastructure. These can be names and versions of software applications, servers, names, user names, hashes, URLs, passwords, and much more.

Information Resources are resources from which we obtain these Core Elements. These information resources can be websites, social networks, documents, scripts, and many others.

Core Elements

Information Resources

Company Information

Home Page

Infrastructure

Files

Leaks

Social Networks

Search Engines

Development Platforms

Leak Resources

This helps us to keep an eye on which resources we can use most of the time and tick them off one by one. However, we have to keep in mind that any information we find will lead to repeated searches and new resources for more detailed information. We can think of it as a growing tree where the core elements represent the branches, and as we add branches, we will have more nodes to connect to perform much more thorough and detailed research.

Therefore, it is highly recommended to organize the whole process in cycles and repeat the search with the new information for each new cycle. This systematic approach allows us to have a structured workflow and clean documentation that will enable the client and us to understand precisely how the data and information have been obtained.

For example, we can take the company name from its website and then search on social networks or use search engines. We can also search for it on different development platforms or different forums. Once we have gathered the core information about it, we start the cycle again, this time using the newly gathered information during the search for related information for the corresponding category. We try to get from the rough to the detailed to get a general overview of the company and only then go into detail. This makes it easier for us to research, obtain structured information, and create clear documentation.

An important point to mention here, which may not be so apparent at first sight, is that there is no order for the information resources we need or must follow. This means that we have a great deal of flexibility to search through the different information resources and to adapt the information we find to our approach.

Finally, a methodology is not a step-by-step guide but a structured workflow independent of the individual test cases.

When we use OSINT, we can divide our core information results into three categories:

Company Information

Infrastructure

Leaks

Organization

Domain Information

Archives

Locations

Public Domain Records

Internal Leaks

Staff

Domain Structure

Breaches

Contact Information

Cloud Storage

\

Business Records

Email Addresses

\

Services

Third-Parties

\

Social Networks

Compounded Social Networks

\

\

Technologies in Use

\

In this methodology, we take a point from the information categories (Core Elements) and search for the relevant information for it through the different information resources.

Theoretically, the reverse procedure can be used too. We could filter out the different information resources and enter the corresponding information results into our documentation information areas. This is the most common method used by most people for OSINT. However, it has a significant disadvantage because we have many information resources to adapt our methodology to, rather than our information resources to our methodology. The result is that we are left with an unstructured approach and are guided by the information resources but not by the information results. We will explain this in more detail in a moment.

During OSINT, we will come across many different resources, which contain information not only for one category but also for others. Therefore we should have at least two separate browsers open.


Workflow


It is essential to understand that, using this methodology, we adapt our information resources to the methodology, not the methodology to our information resources.

As a simple example, we can imagine that we want to search for all employees of a target company. Most people would start looking for the company on social networks like LinkedIn, Xing, and others, look at the company's homepage, and maybe even start a Google search. As mentioned before, this is the method that most people use, and therefore we base our methodology on the information resources.

However, if we follow our methodology, we will work step by step through the information resources shown above in cycles, starting, for example, with the home page, moving on to the social networks, using various search engines and development platforms and forums. In this way, we work according to an organized structure and get far more results.

It requires a little more effort to go through the same page several times to cover different information areas and categories (Core Elements). However, we maintain a structured and transparent methodology without overlooking specific details relevant to a particular information area and category. This also allows us to create clear and detailed documentation and work in cycles independent of the cases.

To make this structured methodology efficient, we need to work with two browser windows.

1. Research Browser

We use this one only for our research. Here we send all search requests and log the entire OSINT process. The research browser's history must be cleaned before use to prevent us from logging results from other companies. We can use the add-on called History Master to log our searches and finally export them as documentation.

2. Resource Browser

The resource browser serves as a summary of the information resources we find during the investigation. In this one, we move all the information resources to which we can return to search for information for other categories. For example, we will get to know development platforms containing leaks and names of employees or developers, email addresses, and usernames.

In this part, we can also use the add-on called SingleFile. This allows us to copy the web pages with the information and save them locally as proof.

The actual process is quite simple, as we now split our results between two browsers. So as soon as we have found a new information resource and examined it in the Research Browser and discover that there is more useful information there, we drag the newly opened tab to the Resource Browser, which we will turn to later.

Next, let us assume we are working on the section/phase to look for email addresses. It looks like this, but it can also expand to other information resources.

We study the home page, social networks, use the search engines, developer platforms, and the leak resources if we are going to follow this structure. Once we have gathered the relevant information, the cycle starts again in this phase, and we begin again with the home page, social networks, etc., but this time with the new information we have found from the first cycle.

We document all our discoveries and structure them accordingly to each phase and our cycle.

If we find valuable information on an information resource that we want to document, we should download this web page accordingly with SingleFile and save it. We will find in the beginning that our documentation (if done correctly) will be far more than 50 pages longer than most OSINT reports that are usually delivered to the client.

We also know that the employer is mainly interested in the results. However, this is not a reason not to make our documentation as detailed as it is. For the employer, the short documentation at the beginning will be sufficient. However, if the employer then wants to remove the found/leaked information, the concise documentation will not be adequate by far. Here, the information resource and the procedure for how this information resource was found are of great importance.


Logging


To work efficiently and in a structured way, we must also document the results and information we find clearly. However, we also need to log our steps and prove how we found the information. With the two different browsers, we have found a way to separate our search from the resources. To create clear documentation, we need three components:

  1. Visited websites

  2. Timestamp

  3. Queries

All of this information is stored in our browser history, and we can use it quite efficiently for our purposes. A handy add-on for this is History Master. This add-on gives us many different options, like sorting, filtering, searching, displaying, and exporting statistics.

![[history-master1.png]]

To work with history efficiently, we need a few more packages that will allow us to display the results better and make them easier to filter. If we have exported our history from History Master and look at it, it will look something like this:

History CSV

head history.csv
visit-time,title,visit-count,typed-count,id,url
1607535086780,Contact – Inlanefreight,1,,tgIKscfGm4PL,https://www.inlanefreight.com/index.php/contact/
1607535085390,News – Inlanefreight,1,,6S_PX7_woFpV,https://www.inlanefreight.com/index.php/news/
1607535084030,Offices – Inlanefreight,1,,UNWCj5EDwP2e,https://www.inlanefreight.com/index.php/offices/
1607535081962,Career – Inlanefreight,1,,6r25bcvsVT21,https://www.inlanefreight.com/index.php/career/
1607535044507,Inlanefreight,2,,6bhPbKqKh9He,https://www.inlanefreight.com/
1607535044304,https://inlanefreight.com/,2,,Od54NGgg3cyS,https://inlanefreight.com/
1607535043995,http://inlanefreight.com/,2,,sfBT401xyE7c,http://inlanefreight.com/

We can install the appropriate packages as follows:

Install NPM, JQ, and Csvtojson

sudo apt install npm jq -y && sudo npm install -g csvtojson

This would help us sort our output much better, allowing it to be filtered with jq, an incredibly powerful JSON processing command-line tool. We will pipe that gathered data into jq to print the JSON appealingly for now.

Sorted History View

csvtojson < history.csv | jq .
[                                                       
  { 
    "visit-time": "1607535086780",
    "title": "Contact – Inlanefreight",
    "visit-count": "1",                                 
    "typed-count": "", 
    "id": "tgIKscfGm4PL",
    "url": "https://www.inlanefreight.com/index.php/contact/"
  },                                                    
  { 
    "visit-time": "1607535085390",
    "title": "News – Inlanefreight",
    "visit-count": "1",                                 
    "typed-count": "", 
    "id": "6S_PX7_woFpV",
    "url": "https://www.inlanefreight.com/index.php/news/"
  },                                                    
  {
    "visit-time": "1607535084030",
    "title": "Offices – Inlanefreight",
    "visit-count": "1",
    "typed-count": "",
    "id": "UNWCj5EDwP2e",
    "url": "https://www.inlanefreight.com/index.php/offices/"
  }, 
  <...SNIP...>
]

SingleFile

We can store all the websites locally using the SingleFile add-on from which we have found the relevant information for each category. This gives us the possibility to search web page contents locally as well as proof for our documentation. This add-on offers an excellent way to download all open tabs with one click. This means that we no longer have to worry about individual screenshots of the pages and do not have to do it manually for each page.

![[singlefile.png]]

Another great advantage of this is that these stored pages can also be searched locally by our customers with simple terminal commands to see which web page the corresponding information is located on.

cat *.html | html2text | grep "Emma Will
Emma Williams  emma.williams@inlanefreight.com

This add-on also offers the possibility to save visited websites automatically. However, as we do not know what information the websites we visit will contain, this is not recommended. Of course, this can also be used for the log if necessary.

Important Note: In this module, a real company is used to illustrate efficiency based on real situations. Therefore, many graphics in the next sections are mostly censored in order not to put this company in focus. It is also strictly forbidden for students to use this module's content to trace the selected company and do further research against it or its personnel.

Last updated