Search for multiple unknown 3 (2) Byte combinations in a file.

**deanhystad** · (This post was last modified: Aug-14-2023, 02:10 PM by deanhystad.)

This is quick and robust (I think). It uses numpy reshape() and concatenate() to pad the 24 bit integers to 32 bits. Probably not as quick as using as_strided(), but only takes 0.006 seconds to process a 1Mbyte file.

import numpy as np
import sys

def int24(bytes_):
    """Convert bytes to 24bit ints.  Return numpy array of ints."""
    # How many 3 byte ints are in bytes_?
    count = bytes_.shape[0] // 3

    # Reshape bytes_ into 3 byte arrays.
    bytes_= bytes_[:count*3].reshape((count, 3))

    # Pad with zeros to make 4 byte arrays
    if sys.byteorder == "little":
        padded = np.concatenate((bytes_, np.zeros((count, 1), dtype=np.uint8)), axis=1)
    else:
        padded = np.concatenate((np.zeros((count, 1), dtype=np.uint8), bytes_), axis=1)

    # Convert 4 byte arrays to 4 byte ints
    return np.frombuffer(padded.tobytes(), dtype=np.uint32)


# Load file and convert to 24bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = int24(bytes_)

# Throw away values that are not in range x8A000...0x8F0000
inrange = asints[(asints >= 0x8A0000) & (asints < 0x8F0000)]
 
# Get counts for each value.  Save as tuple (count, hex value)
counts = [(count, hex(value)) for value, count in zip(*np.unique((inrange), return_counts=True))]
 
print(sorted(counts, reverse=True)[:10])

And if the 8C/8D/8E can be anywhere in the file, at any offset, just shift the bytes_ array and resample.

# Load file and convert to 24bit ints.  Shift the
# starting point to get all 24 bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = np.concatenate(
    (int24(bytes_), int24(bytes_[1:]), int24(bytes_[2:]))
)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Search Excel File with a list of values	huzzug	4	1,375	Nov-03-2023, 05:35 PM Last Post: huzzug
	search file by regex	SamLiu	1	989	Feb-23-2023, 01:19 PM Last Post: deanhystad
	Finding combinations of list of items (30 or so)	LynnS	1	938	Jan-25-2023, 02:57 PM Last Post: deanhystad
	If function is false search next file	mattbatt84	2	1,230	Sep-04-2022, 01:56 PM Last Post: deanhystad
	Python: re.findall to find multiple instances don't work but search worked	Secret	1	1,292	Aug-30-2022, 08:40 PM Last Post: deanhystad
	Search multiple CSV files for a string or strings	cubangt	7	8,368	Feb-23-2022, 12:53 AM Last Post: Pedroski55
	fuzzywuzzy search string in text file	marfer	9	4,783	Aug-03-2021, 02:41 AM Last Post: deanhystad
	How can I find all combinations with a regular expression?	AlekseyPython	0	1,725	Jun-23-2021, 04:48 PM Last Post: AlekseyPython
	Cloning a directory and using a .CSV file as a reference to search and replace	bg25lam	2	2,219	May-31-2021, 07:00 AM Last Post: bowlofred
	All possible combinations	CODEP	2	1,932	Dec-01-2020, 06:10 PM Last Post: deanhystad

Search for multiple unknown 3 (2) Byte combinations in a file.

User Panel Messages

Announcements