mastering source code analysis lab

Here’s a comprehensive guide to mastering source code analysis with:

  • 🧨 Vulnerable codebases

  • βš’οΈ Static analysis tools

  • πŸ” Manual code review techniques

  • πŸ’» Real-world scenarios

  • 🎯 Language-specific recommendations


🧨 Vulnerable Codebases for Practice

1. Vulnerable Web Applications with Source Code Access

Name
Stack
Link

OWASP Juice Shop

Node.js

DVWA

PHP

bWAPP

PHP

WebGoat

Java

NodeGoat

Node.js

Vulnerable Flask App

Python

RailsGoat

Ruby

These are perfect for:

  • Identifying insecure functions

  • Understanding common anti-patterns

  • Practicing code-level exploitation


βš’οΈ Static Analysis Tools (SAST)

These tools help automatically detect vulnerabilities in source code.

πŸ”Ή Multi-language Tools

Tool
Strengths

Semgrep

Fast, rule-based scanner, highly customizable

SonarQube / SonarCloud

Full enterprise-grade code quality + security

CodeQL (GitHub)

Semantic code analysis with advanced query language

Bearer

Privacy/security scanner, excellent for API and secrets issues

Bandit

Python-specific analyzer

Gosec

Go language security scanner

Brakeman

For Ruby on Rails apps

FindSecBugs (SpotBugs)

Java bytecode scanner with OWASP rules

Taint-Mode Tools (e.g., CodeQL)

Track data flow across sinks and sources


πŸ” Manual Review Methodology

What to Look For:

Category
Examples

Input Validation

eval(), exec(), raw SQL, unsanitized headers

Authentication/Authorization

Hardcoded creds, weak token logic, IDOR

Session Management

Insecure cookies, session fixation

Insecure Storage

Passwords stored in plaintext, weak encryption

Crypto Failures

Homebrew crypto, ECB mode, missing IVs

Command Execution

Shell injection points (os.system, subprocess)

Deserialization

pickle.load(), Java readObject()

Race Conditions

TOCTOU bugs, threading misuse

Secrets in Code

API keys, passwords in .env, .git or config files

Logic Flaws

Business logic errors, bypasses, incorrect conditions

Tactics:

  • Trace data flow from source (user input) to sink (DB, shell, etc.)

  • Review API endpoints, middleware, and helper utilities

  • Look for insecure defaults or commented-out protections

  • Check for inconsistent authentication or authorization checks


πŸ“¦ Real-World Source Code Analysis Scenarios

Scenario
Language
How to Practice

Insecure Deserialization

Java, PHP, Python

WebGoat, vulnerable Flask App

SQL Injection via ORM Misuse

Node.js, Django

NodeGoat, custom apps

Insecure File Upload

PHP, .NET, Python

DVWA, bWAPP

Broken Access Control

Java, Ruby

WebGoat, RailsGoat

Secrets Leaking in Commits

Any

Practice using truffleHog, git-secrets on GitHub repos


πŸ§ͺ Automated Secrets & Token Discovery

Tool
Purpose

truffleHog

Finds secrets in git repos

gitLeaks

Secret scanning + custom rules

detect-secrets

Pre-commit hook for secret detection

GitHub Advanced Security

Secrets scanning across public/private repos (if enabled)


πŸ“š Resources to Learn Source Code Review

Resource
Focus

The Art of Software Security Assessment (Book)

Deep manual code review

OWASP Code Review Guide

Framework-agnostic review practices

Semgrep Playground

Practice writing your own detection rules

CodeQL Learning Lab (GitHub)

Create advanced security queries

PortSwigger Labs + Source View

Browse source + exploit live (Pro users)

HackTheBox Academy – Secure Coding

Great for building secure code review mindset


🎯 Language-Specific Advice

Language
Unique Risks

JavaScript (Node.js)

eval, insecure templates, prototype pollution

Python

pickle, yaml.load, subprocess abuse

PHP

Variable variables, LFI/RFI, magic quotes

Java

Deserialization, unsafe reflection

Go

Lack of built-in auth, poor error handling

C#/.NET

Insecure crypto APIs, config leaks

Ruby (Rails)

eval, mass assignment (params.permit!)


πŸ”¬ Advanced Manual Code Review Tactics

Beyond basic input validation, look for:

1. Inconsistent Authorization

  • Checks on the UI but not enforced on the backend.

  • Example:

    if user.is_admin:
        return dashboard()
    # missing auth here
    return data_view()

2. Logic Bombs

  • Flawed if/else logic that causes unintended access:

    if not user.is_banned or user.is_admin:
        allow_access()

3. Regex Bypass

  • Poor input filters like:

    re.match("[a-zA-Z0-9]+", input)  # misses full string match

4. Unsafe File Access

  • Watch for string concatenation in file paths:

    open(f"files/{filename}", "r")  # LFI risk

5. Cryptographic Failures

  • Look for:

    • Hardcoded keys

    • Static IVs

    • ECB mode

    • Custom encryption


🧠 Less Common but Critical Vulnerabilities to Look For

Vuln Type
How It Appears

Insecure logging

Logging secrets or tokens to logs (logger.debug(auth_token))

Feature flag misfires

Admin features enabled via client-side toggle

Insecure dependencies

Importing vulnerable libraries (e.g., event-stream in Node.js)

OAuth/OpenID misimplementation

Not validating aud, iss, exp in JWTs

Time-based attacks

Login timing difference revealing valid usernames

Race conditions

check_balance() β†’ withdraw() in fast sequence

Insecure template rendering

Using render(request, user_input) (template injection)


πŸ§ͺ Open-Source Projects You Can Practice On (Ethically)

These are security-focused or β€œbug bounty” friendly:

Project
Stack
Notes

Mozilla Firefox (Bug Bounty Eligible)

C++/Rust

Complex parsing logic, legacy components

OWASP Threat Dragon

Node.js

Simple but real-world app

SecureDrop (Freedom of the Press Foundation)

Python/Flask

Handles anonymity + encryption

Bitwarden CLI

Go

Password manager code is rich in crypto logic

Mastodon

Ruby on Rails

Great for OAuth, ActivityPub protocol abuse

PeerTube

Node.js + P2P

Use case: content access control logic

Audit code and match findings to:


🚨 Real Vulnerabilities Found via Source Review

Vuln
Description
Source

Signal

XSS in link previews inside encrypted messages

Slack

Command injection via internal CLI

Found via static analysis

npm event-stream

Malicious dependency used in open source

JWT None Bypass

Developers forgot to check alg=none

Found via manual token review

GitLab RCE

YAML config deserialization bug


βš™οΈ Source Code Review in CI/CD (DevSecOps Integration)

Tool
Best Used For
CI Integration

Semgrep CI

Code scanning, custom rules

GitHub Actions, GitLab CI, CircleCI

Snyk Code

SAST + dependency scan

Built-in GitHub integration

Checkov + Terrascan

IaC security

Terraform, CloudFormation audits

GitLeaks + TruffleHog

Secret detection

Git pre-commit + PR check

CodeQL

Advanced data flow analysis

GitHub native or custom CI

πŸ’‘ Tip: Set to fail builds only on high-severity findings or use a "monitor-only" mode.


πŸ‘©β€πŸ’» Bug Bounty Source Review (If You Have Source or Recon Access)

Focus on these techniques:

Attack Vector
What to Look For

API keys in frontend

.env, .js, .map, Vue/React bundles

Old endpoints in codebase

Routes not in Swagger/UI

JS/TS source leaks

.map files or open-source client repos

GitHub dorks

company filename:.env or filename:docker-compose.yml

Accidental test code in production

Debug ports, credentials in staging branches

API backend repos

No rate limits, misauth, internal APIs exposed


Last updated

Was this helpful?