• Apple Updates Everything: July 2025, (Tue, Jul 29th) SANS Internet Storm Center, InfoCON: green
    • Nimble ‘Gunra’ Ransomware Evolves With Linux Variant darkreadingElizabeth Montalbano, Contributing Writer
    • Wiz Uncovers Critical Access Bypass Flaw in AI-Powered Vibe Coding Platform Base44 The Hacker [email protected] (The Hacker News)
    • PyPI Warns of Ongoing Phishing Campaign Using Fake Verification Emails and Lookalike Domain The Hacker [email protected] (The Hacker News)
    • The Hidden Threat of Rogue Access darkreadingDurgaprasad Balakrishnan
    • The Beginner’s Guide to Using AI: 5 Easy Ways to Get Started (Without Accidentally Summoning Skynet)
      by Tech Jacks
      March 29, 2025
    • Tips and Tricks to Enhance Your Incident Response Procedures
      by Tech Jacks
      March 17, 2025
    • Building a Security Roadmap for Your Company: Strategic Precision for Modern Enterprises 
      by Tech Jacks
      March 10, 2025
    • The Power of Policy: How Creating Strong Standard Operating Procedures Expedites Security Initiatives
      by Tech Jacks
      March 6, 2025
    • Building a Future-Proof SOC: Strategies for CISOs and Infosec Leaders 
      by Tech Jacks
      March 3, 2025
    • Security Gate Keeping – Annoying – Unhelpful
      by Tech Jacks
      November 13, 2024

  • Home
  • Blog & Observations
  • Articles
    • Guest Author
      • Peter Ramadan
        • SOC IT to ME
        • The Power of Policy
        • CISO Elite
  • In The News
  • Podcast & Vlogs
    • Podcast Videos
    • Security Unfiltered Podcast Information
  • Training & Videos
    • AI
      • AI Governance
    • Cloud
      • AWS
      • Azure
      • Google Cloud
    • Networking
    • Scripting
    • Security
      • Application Security
      • Cloud Security
      • Incident Response
      • Pentesting Information
      • Risk Management
      • Security Policy
    • Servers
    • Microsoft SCCM
    • ISC2
  • Services

Triage is Key! Python to the Rescue!, (Tue, Jul 29th) SANS Internet Storm Center, InfoCON: green

July 29, 2025

When you need to quickly analyze a lot of data, there is one critical step to perform: Triage. In forensic investigations, this step is critical because it allows investigators to quickly identify, prioritize, and isolate the most relevant or high value evidence from large volumes of data, ensuring that limited time and resources are focused on artifacts most likely to reveal key facts about an incident. Sometimes, a quick script will be enough to speed up this task. 

When you need to quickly analyze a lot of data, there is one critical step to perform: Triage. In forensic investigations, this step is critical because it allows investigators to quickly identify, prioritize, and isolate the most relevant or high value evidence from large volumes of data, ensuring that limited time and resources are focused on artifacts most likely to reveal key facts about an incident. Sometimes, a quick script will be enough to speed up this task.

Today, I’m working on a case where I have a directory containing +20.000 mixed files. Amongst them, a lot of ZIP archives (mainly Office documents), containing also lot of files. The idea is to scan all those files (including the ZIP archives) for some keywords. I wrote a quick Python script that will scan all files against the embedded YARA rule and, if a match is found, copy the original file into a destination directory.

Here is the script:

#
# Quick Python triage script
# Copy files matching a YARA rule to another directory
#
import yara
import os
import shutil
import zipfile
import io

# YARA rule
yara_rule = """
rule case_xxxxxx_search_1
{
    strings:
        $s1 = "string1" nocase wide ascii
        $s2 = "string2" nocase wide ascii
        $s3 = "string3" nocase wide ascii
        $s4 = "string4" nocase wide ascii
        $s5 = "string5" nocase wide ascii
    condition:
        any of ($s*)
}
"""

source_dir = "Triage"
dest_dir = "MatchedFiles"
os.makedirs(dest_dir, exist_ok=True)
rules = yara.compile(source=yara_rule)

def is_zip_file(filepath):
    """
    Check ZIP archive magic bytes.
    """
    try:
        with open(filepath, "rb") as f:
            sig = f.read(4)
            return sig in (b"PKx03x04", b"PKx05x06", b"PKx07x08")
    except Exception:
        return False

def safe_extract_path(member_name):
    """
    Returns a safe relative path inside the destination folder (Prevent .. in paths).
    """
    return os.path.normpath(member_name).replace("..", "_")

def scan_file(filepath, file_bytes=None, inside_zip=False, zip_name=None, member_name=None):
    """
    Scan a file with YARA.
    """
    try:
        if file_bytes is not None:
            matches = rules.match(data=file_bytes)
        else:
            matches = rules.match(filepath)

        if matches:
            if inside_zip:
                print("[MATCH] {member_name} (inside {zip_name})")
                rel_path = os.path.relpath(zip_name, source_dir)
                filepath = os.path.join(source_dir, rel_path)
                dest_path = os.path.join(dest_dir, rel_path)
            else:
                print("[MATCH] {filepath}")
                rel_path = os.path.relpath(filepath, source_dir)
                dest_path = os.path.join(dest_dir, rel_path)
            
            # Save a copy
            os.makedirs(os.path.dirname(dest_path), exist_ok=True)
            shutil.copy2(filepath, dest_path)
    except Exception as e:
        print(e)
        pass

# Main
for root, dirs, files in os.walk(source_dir):
    for name in files:
        filepath = os.path.join(root, name)
        if is_zip_file(filepath):
            try:
                with zipfile.ZipFile(filepath, 'r') as z:
                    for member in z.namelist():
                        if member.endswith("/"):  # Skip directories
                            continue
                        try:
                            file_data = z.read(member)
                            scan_file(member, file_bytes=file_data, inside_zip=True, zip_name=filepath, member_name=member)
                        except Exception:
                            pass
            except zipfile.BadZipFile:
                pass
        else:
            scan_file(filepath)

Now, you can enjoy some coffee while the script does the job:

[MATCH] docProps/app.xml (inside Triagexxxxxxx.xlsx)
[MATCH] xl/sharedStrings.xml (inside Triagexxxxx.xlsx)
[MATCH] xl/sharedStrings.xml (inside Triagexxxxxxxxxxxxxxxxxxxx.xlsx)
[MATCH] ppt/slides/slide3.xml (inside Triagexxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide12.xml (inside Triagexxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide14.xml (inside Triagexxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide15.xml (inside Triagexxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] xl/sharedStrings.xml (inside Triagexxxxxxxx.xlsx)
[MATCH] Triagexxxxxxxxxxxxxxxxxxxxxxx.pdf
[MATCH] Triagexxxxxxxxxxxxxxxxxxx.xls
[MATCH] xl/sharedStrings.xml (inside Triagexxxxxxxxxxxxxxxx.xlsx)
[MATCH] Triagexxxxxxxxxxxxxxxxxxxxxxxxxx.xls

You can see that, with a few lines of Python, you can speedup the triage phase in your investigations. Note that the script is written to handle my current files set and is not ready for broader use (lile to handle password-protected archives or other types of archives)

Xavier Mertens (@xme)
Xameco
Senior ISC Handler – Freelance Cyber Security Consultant
PGP Key

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License. 

​Read More

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to email a link to a friend (Opens in new window) Email

Like this:

Like Loading...
Share

In The News

Tech Jacks
Derrick Jackson is a IT Security Professional with over 10 years of experience in Cybersecurity, Risk, & Compliance and over 15 Years of Experience in Enterprise Information Technology

Leave A Reply


Leave a Reply Cancel reply

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • Blog

    • Security Gate Keeping - Annoying - Unhelpful
      November 13, 2024
    • 15 Years on LinkedIn: An Authentic Reflection(or a Beauty...
      October 24, 2024
    • Podcast & Cloud Security Governance
      February 24, 2021
    • The Journey Continues - Moving through 2021
      January 5, 2021
    • CISSP Journey
      February 22, 2019




  • About TechJacks
  • Privacy Policy
  • Gaming Kaiju
© Copyright Tech Jacks Solutions 2025

%d