Python Forum
Search for multiple unknown 3 (2) Byte combinations in a file.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search for multiple unknown 3 (2) Byte combinations in a file.
#8
This is quick and robust (I think). It uses numpy reshape() and concatenate() to pad the 24 bit integers to 32 bits. Probably not as quick as using as_strided(), but only takes 0.006 seconds to process a 1Mbyte file.
import numpy as np
import sys

def int24(bytes_):
    """Convert bytes to 24bit ints.  Return numpy array of ints."""
    # How many 3 byte ints are in bytes_?
    count = bytes_.shape[0] // 3

    # Reshape bytes_ into 3 byte arrays.
    bytes_= bytes_[:count*3].reshape((count, 3))

    # Pad with zeros to make 4 byte arrays
    if sys.byteorder == "little":
        padded = np.concatenate((bytes_, np.zeros((count, 1), dtype=np.uint8)), axis=1)
    else:
        padded = np.concatenate((np.zeros((count, 1), dtype=np.uint8), bytes_), axis=1)

    # Convert 4 byte arrays to 4 byte ints
    return np.frombuffer(padded.tobytes(), dtype=np.uint32)


# Load file and convert to 24bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = int24(bytes_)

# Throw away values that are not in range x8A000...0x8F0000
inrange = asints[(asints >= 0x8A0000) & (asints < 0x8F0000)]
 
# Get counts for each value.  Save as tuple (count, hex value)
counts = [(count, hex(value)) for value, count in zip(*np.unique((inrange), return_counts=True))]
 
print(sorted(counts, reverse=True)[:10])
And if the 8C/8D/8E can be anywhere in the file, at any offset, just shift the bytes_ array and resample.
# Load file and convert to 24bit ints.  Shift the
# starting point to get all 24 bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = np.concatenate(
    (int24(bytes_), int24(bytes_[1:]), int24(bytes_[2:]))
)
Reply


Messages In This Thread
RE: Search for multiple unknown 3 (2) Byte combinations in a file. - by deanhystad - Aug-14-2023, 02:28 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Search Excel File with a list of values huzzug 4 1,375 Nov-03-2023, 05:35 PM
Last Post: huzzug
  search file by regex SamLiu 1 989 Feb-23-2023, 01:19 PM
Last Post: deanhystad
  Finding combinations of list of items (30 or so) LynnS 1 938 Jan-25-2023, 02:57 PM
Last Post: deanhystad
  If function is false search next file mattbatt84 2 1,230 Sep-04-2022, 01:56 PM
Last Post: deanhystad
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,292 Aug-30-2022, 08:40 PM
Last Post: deanhystad
  Search multiple CSV files for a string or strings cubangt 7 8,368 Feb-23-2022, 12:53 AM
Last Post: Pedroski55
  fuzzywuzzy search string in text file marfer 9 4,783 Aug-03-2021, 02:41 AM
Last Post: deanhystad
  How can I find all combinations with a regular expression? AlekseyPython 0 1,725 Jun-23-2021, 04:48 PM
Last Post: AlekseyPython
  Cloning a directory and using a .CSV file as a reference to search and replace bg25lam 2 2,219 May-31-2021, 07:00 AM
Last Post: bowlofred
  All possible combinations CODEP 2 1,932 Dec-01-2020, 06:10 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020