Skip to content

Importance of GitHub Reconnaissance in CASM & CART

GitHub reconnaissance is an important aspect of attack surface management, particularly for organizations and individuals who rely heavily on software development and open-source code.

Here’s why it is crucial:

  1. Discovery of Sensitive Information
    Developers sometimes inadvertently push sensitive information e.g. hardcoded credentials, API keys, access tokens and Configuration Files containing sensitive information
  2. Identification of Security Vulnerabilities
    GitHub repos may contain outdated libraries or dependencies with known vulnerabilities, analysis of source code could lead to identification of critical vilunerbilities e.g. injjection attacks
  3. Understanding the Development Practices
    Analyzing repositories can provide insights into the organization’s coding practices and patch mangegment process.
  4. Third-party Component Tracking
    Supply chain attacks has become common these days , github repos can be used to identiy dependancies of third parties and their respective vulnerabilities.
  5. Historical Analysis
    Commit History: Past commits can reveal changes in security practices and expose periods where the codebase was vulnerable.
  6. Information for Targeted Attacks
    Github repos leaks employee information e.g. emails, user id , and password pattern
  7. Compliance and Legal Risks
    License Violations: Ensuring compliance with open-source licenses is crucial to avoid legal issues.
    Data Exposure: Unintentional exposure of personal data can lead to compliance issues with regulations like GDPR or HIPAA.
  8. Open-source Intelligence (OSINT)
    GitHub repos can also provide competitive intelligence about an organization’s projects and focus areas.

Understanding GitHub Recon

GitHub Recon is a multifaceted process that involves discovering and analyzing information related to individuals, organizations, and projects on the GitHub platform. It typically involves:

  1. Profile Analysis: Examining user profiles for personal and professional information, such as email addresses, linked social media accounts, and the repositories they contribute to. This information can be valuable for building a target’s profile or identifying potential attack vectors.
  2. Repository Scanning: Searching for repositories containing sensitive data, configuration files, and potential vulnerabilities. This can help identify security weaknesses, misconfigurations, or hidden backdoors.
  3. Codebase Assessment: Analyzing the code within repositories to identify coding standards, software vulnerabilities, and possible exploits. This is particularly useful for security researchers and penetration testers.
  4. Insights Gathering: Extracting trends, technologies, or software libraries frequently used by a target organization, which can provide valuable insights for competitive analysis and threat intelligence.

-> Hackers Won't Wait For Your Next Pen Test: Know Automated Continuous Pen Test

Why GitHub Recon Matters

GitHub Recon is relevant for various use cases and professions:

  1. Cybersecurity: Security professionals can use GitHub Recon to identify potential security weaknesses and vulnerabilities within code repositories, allowing for proactive threat mitigation.
  2. Red Teaming: Red teamers can leverage GitHub Recon to gather intelligence about their target organization’s technology stack, coding practices, and key individuals to better plan their attack strategies.
  3. Competitive Analysis: Businesses can gain insights into their competitors’ technology choices, software libraries, and open-source contributions to make informed decisions and gain a competitive edge.
  4. Research and Development: Developers and researchers can benefit from GitHub Recon by finding valuable open-source projects, libraries, or code snippets to enhance their work or leverage open-source solutions.

We will be covering two ways of GitHub Recon :

  1. Manual (Code Search OR GitHub Dorking)
  2. Automated (Using Tools)

Manual – Code Search || GitHub Dorking

To find sensitive information such as passwords, API keys, database files, etc., code search is just the application of certain keywords.

You are able to do a global search on GitHub for code. Additionally, you can look for code inside a specific organization or repository. You must have a GitHub account and be logged in in order to search for code in all public repositories. GitHub provides “rich code searching” that scans public github repositories.

How to do a recon on GitHub ?

  1. When looking for a specific company, you can use basic search terms like facebook.com, google.com, etc.

2. We can also use multi-word strings like “Authorization: Bearer”

>>Click Here: To Get The Report Of Gartner Hype Cycle For Penetration Testing & Red Teaming

We now have to open a repository and look for any sensitive data, such as a password or authorization token.

3. we can search for specific file names like “filename:vim_settings.xml”

4. we can search for specific languages like “language:PHP”

This covered the fundamentals of github dorking, but you can also combine queries like “facebook.com filename:vim_settings.xml” to obtain all of the vim_settings.xml files associated with a specific Facebook company. In the same manner, we can run various query searches.

There is a concept called GitHub Dorking that lessens the effort involved in manually searching for sensitive information on github. Finding sensitive information on github requires a lot of time and checking every repository belonging to a specific company.

In addition to repositories, you can search for users, wikis, code, commits, issues, discussions, packages, marketplaces, and topics. .

Apart from using GitHub Dorks, we can directly search for the source. For doing that you need to find your target company’s github page and from there you can find all their developers and monitor their accounts.

Once you find your target company’s github page you just need to check the list of people that are associated with your target company. This can be done by clicking on the “people” tab.

>>Click Here To Get The Report Of Gartner Hype Cycle For Security Operations 2023 Report

Now you will need to manually go through each one and look for exposures and this will take long time. You should be looking for urls, api keys, usernames, passwords etc. It might be possible that someone has uploaded something sensitive here.

GitHub Dork List :

#GitHub Dorks for Finding Files#GitHub Dorks for Finding Languages
1filename:manifest.xml44language:python username
2filename:travis.yml45language:php username
3filename:vim_settings.xml46language:sql username
4filename:database47language:html password
5filename:prod.exs NOT prod.secret.exs48language:perl password
6filename:prod.secret.exs49language:shell username
7filename:.npmrc _auth50language:java api
8filename:.dockercfg auth51HOMEBREW_GITHUB_API_TOKEN language:shell
9filename:WebServers.xml#GiHub Dorks for Finding API Keys, Tokens and Passwords
10filename:.bash_history <Domain name>52api_key
11filename:sftp-config.json53“api keys”
12filename:sftp.json path:.vscode54authorization_bearer:
13filename:secrets.yml password55oauth
14filename:.esmtprc password56auth
15filename:passwd path:etc57authentication
16filename:dbeaver-data-sources.xml58client_secret
17path:sites databases password59api_token:
18filename:config.php dbpasswd60“api token”
19filename:prod.secret.exs61client_id
20filename:configuration.php JConfig password62password
21filename:.sh_history63user_password
22shodan_api_key language:python64user_pass
23filename:shadow path:etc65passcode
24JEKYLL_GITHUB_TOKEN66client_secret
25filename:proftpdpasswd67secret
26filename:.pgpass68password hash
27filename:idea14.key69OTP
28filename:hub oauth_token70user auth
29HEROKU_API_KEY language:json#GitHub Dorks for Finding Usernames
30HEROKU_API_KEY language:shell71user:name (user:admin)
31SF_USERNAME salesforce72org:name (org:google type:users)
32filename:.bash_profile aws73in:login (<username> in:login)
33extension:json api.forecast.io74in:name (<username> in:name)
34filename:.env MAIL_HOST=smtp.gmail.com75fullname:firstname lastname (fullname:<name> <surname>)
35filename:wp-config.php76in:email (data in:email)
36extension:sql mysql dump#GitHub Dorks for Finding Information using Dates
37filename:credentials aws_access_key_id77created:<2012–04–05
38filename:id_rsa or filename:id_dsa78created:>=2011–06–12
GitHub Dorks for Finding Information using Extension
39extension:pem private79extension:json mongolab.com
40extension:ppk private80extension:yaml mongolab.com
41extension:sql mysql dump81[WFClient] Password= extension:ica
42extension:sql mysql dump password82extension:avastlic “support.avast.com”
43extension:json api.forecast.io83extension:json googleusercontent client_secret

So this was all about manual technique to find sensitive information on github, let’s move to some automated technique.

Automated Technique – Using Tools

However, automation makes the process easy and fast but it also has its own drawback of false-positive results. Not every time the result is false-positive but sometimes it may happen.

  • TruggleHog

It is easy to use. It searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.

How to use it ?

  1. Go to https://github.com/dxa4481/truffleHog and clone it (download it)
  2. Use to below given command to find for sensitive information

Command : python3 trufflehog.py –regex –entropy=False https://github.com/<yourTargetRepo>

Pros:

  1. Trufflehog is a free and open-source tool.
  2. It is easy to use and can be run with a simple command.
  3. Trufflehog can detect a wide variety of different data types

Cons:

  1. Trufflehog is a passive tool and cannot detect data that is not publicly accessible.
  2. Trufflehog can generate a large number of false positives, which can be time-consuming to filter through.
  3. Trufflehog is not always accurate and can sometimes identify data that is not sensitive.

  • Github-Dorks

It is a simple python tool that can search through your repository or your organization/user repositories.

How to use it ?

  1. Go to https://github.com/techgaun/github-dorks and clone it (download it
    Install all the given requirements
  2. Use the below given command to search for all the repositories of a single user
    Command : python github-dork.py -u <username>


Pros:

  1. github-dorks is a free and open-source tool.
  2. It is easy to use and can be run with a simple command
  3. It can be used to search all repos of a user or organization.

Cons:

  1. It can be slow because it waits for the api rate limit to be reset.
  2. The output formatting is not great compared to truffehog
  • Nightfall

AI-powered scanner to detect API keys, secrets, sensitive information. Nightfall Radar API lets you integrate with GitHub public or private repository, AWS, GitLab, Twilio, etc. The scan results are available on a web interface or CLI output. You can read more about it here : https://radar.nightfall.ai/docs#get-results. Basically it is a web application that helps you to scan github repositories.

How to use it ?

  1. Go to https://radar.nightfall.ai/ and login with your github account.
  2. Simply add your github’s target URL on the left top section for scanning

3. After the scan is completed, click on results to view the information and you’ll be redirected to another page like below one

4. Now click on GitHub to see the leaked information on github

Pros:

  1. Comprehensive: Nightfall Radar can uncover a wide range of exposed assets, including subdomains, AWS buckets, Git repositories, and more.
  2. Easy to use: Nightfall Radar has a user-friendly interface and can be operated with simple commands.
  3. Regular updates: Nightfall Radar is actively maintained with frequent updates to address new vulnerabilities and enhance its capabilities.

Cons:

  1. Commercial tool: Nightfall Radar is a paid tool, requiring a license for its use
  2. Potential false positives: Nightfall Radar’s scans may generate false positives, which can be time-consuming to verify.

Conclusion

In conclusion, GitHub reconnaissance plays a pivotal role in attack surface management by providing a comprehensive view of potential vulnerabilities and exposure points within an organization’s public and private code repositories. It’s a critical step in identifying security weaknesses that could be exploited by attackers, as well as in safeguarding sensitive information that might be inadvertently exposed.

Through the meticulous examination of repositories, commit histories, code snippets, configuration files, and developer discussions, organizations can uncover hidden risks ranging from hard-coded secrets to outdated dependencies with known vulnerabilities. This process not only helps in preemptively addressing security loopholes but also enhances the overall security posture by informing better coding practices and tighter access controls.

Furthermore, GitHub reconnaissance extends beyond mere vulnerability identification. It offers insights into the development culture and practices of an organization, paving the way for more informed and strategic security decision-making. By understanding how and why certain security flaws are introduced, organizations can implement more effective security training and awareness programs for their developers.

Firecompass CART Platform utilizes AI powered engines to run active probing on github  and continuously discover common misconfigurations, code leaks , hardcoded credentials and much more.

Some other automated tools for scanning GitHub Repositories :
https://github.com/BishopFox/GitGot
https://github.com/Talkaboutcybersecurity/GitMonitor
https://github.com/michenriksen/gitrob
https://github.com/tillson/git-hound
https://github.com/kootenpv/gittyleaks
https://github.com/awslabs/git-secrets https://git-secret.io/

Author: Vishal Vishwakarma
Guide: Sanket Kakde

About FireCompass:

FireCompass is a SaaS platform for Continuous Automated Pen Testing, Red Teaming  and External Attack Surface Management (EASM). FireCompass continuously indexes and monitors the deep, dark and surface webs using nation-state grade reconnaissance techniques. The platform automatically discovers an organization’s digital attack surface and launches multi-stage safe attacks, mimicking a real attacker, to help identify breach and attack paths that are otherwise missed out by conventional tools.

Feel free to get in touch with us to get a better view of your attack surface.

Important Resources:

Priyanka Aash