Python Forum
extract zip from .msg files and saving according to date stamp of email
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract zip from .msg files and saving according to date stamp of email

I've been running into some issues with trying to run some code via Visual Studio. I'm new to python and visual studio, and I found an extract code developed by a Matthew Walker (see below) but I need to do an easy install of OLE which I'm not sure where to write this in, I tried putting it in the command, or searching to see if it's in an extension, but nothing. I can't import the ole without installing it and I have no clue how to do that. From the looks of the code, I won't be needing a lot of the subject matter in the email, just the date stamp, because I would ideally like to be able to save the zip file as the date stamp from the email. When running this code, beyond the undefined OLE, I also go a undefined .json (line 417), unicode (line 415), decode_utf7 (line 215), which I don't know if all could be traced back to the install of ole.

Would greatly appreciate anyone who could help in debugging. I have all the .msg files in one folder and need to save the attached zip files with a date stamp in another location.


Thank you!

#!/usr/bin/env python
# -*- coding: latin-1 -*-
    Extracts emails and attachments saved in Microsoft Outlook's .msg files

__author__ = "Matthew Walker"
__date__ = "2016-10-09"
__version__ = '0.3'

# --- LICENSE -----------------------------------------------------------------
#    Copyright 2013 Matthew Walker
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    GNU General Public License for more details.
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <>.

import os
import sys
import glob
import traceback
from email.parser import Parser as EmailParser
import email.utils
import ole

# This property information was sourced from
# on 2013-07-22.
properties = {
    '001A': 'Message class',
    '0037': 'Subject',
    '003D': 'Subject prefix',
    '0040': 'Received by name',
    '0042': 'Sent repr name',
    '0044': 'Rcvd repr name',
    '004D': 'Org author name',
    '0050': 'Reply rcipnt names',
    '005A': 'Org sender name',
    '0064': 'Sent repr adrtype',
    '0065': 'Sent repr email',
    '0070': 'Topic',
    '0075': 'Rcvd by adrtype',
    '0076': 'Rcvd by email',
    '0077': 'Repr adrtype',
    '0078': 'Repr email',
    '007d': 'Message header',
    '0C1A': 'Sender name',
    '0C1E': 'Sender adr type',
    '0C1F': 'Sender email',
    '0E02': 'Display BCC',
    '0E03': 'Display CC',
    '0E04': 'Display To',
    '0E1D': 'Subject (normalized)',
    '0E28': 'Recvd account1 (uncertain)',
    '0E29': 'Recvd account2 (uncertain)',
    '1000': 'Message body',
    '1008': 'RTF sync body tag',
    '1035': 'Message ID (uncertain)',
    '1046': 'Sender email (uncertain)',
    '3001': 'Display name',
    '3002': 'Address type',
    '3003': 'Email address',
    '39FE': '7-bit email (uncertain)',
    '39FF': '7-bit display name',

    # Attachments (37xx)
    '3701': 'Attachment data',
    '3703': 'Attachment extension',
    '3704': 'Attachment short filename',
    '3707': 'Attachment long filename',
    '370E': 'Attachment mime tag',
    '3712': 'Attachment ID (uncertain)',

    # Address book (3Axx):
    '3A00': 'Account',
    '3A02': 'Callback phone no',
    '3A05': 'Generation',
    '3A06': 'Given name',
    '3A08': 'Business phone',
    '3A09': 'Home phone',
    '3A0A': 'Initials',
    '3A0B': 'Keyword',
    '3A0C': 'Language',
    '3A0D': 'Location',
    '3A11': 'Surname',
    '3A15': 'Postal address',
    '3A16': 'Company name',
    '3A17': 'Title',
    '3A18': 'Department',
    '3A19': 'Office location',
    '3A1A': 'Primary phone',
    '3A1B': 'Business phone 2',
    '3A1C': 'Mobile phone',
    '3A1D': 'Radio phone no',
    '3A1E': 'Car phone no',
    '3A1F': 'Other phone',
    '3A20': 'Transmit dispname',
    '3A21': 'Pager',
    '3A22': 'User certificate',
    '3A23': 'Primary Fax',
    '3A24': 'Business Fax',
    '3A25': 'Home Fax',
    '3A26': 'Country',
    '3A27': 'Locality',
    '3A28': 'State/Province',
    '3A29': 'Street address',
    '3A2A': 'Postal Code',
    '3A2B': 'Post Office Box',
    '3A2C': 'Telex',
    '3A2D': 'ISDN',
    '3A2E': 'Assistant phone',
    '3A2F': 'Home phone 2',
    '3A30': 'Assistant',
    '3A44': 'Middle name',
    '3A45': 'Dispname prefix',
    '3A46': 'Profession',
    '3A48': 'Spouse name',
    '3A4B': 'TTYTTD radio phone',
    '3A4C': 'FTP site',
    '3A4E': 'Manager name',
    '3A4F': 'Nickname',
    '3A51': 'Business homepage',
    '3A57': 'Company main phone',
    '3A58': 'Childrens names',
    '3A59': 'Home City',
    '3A5A': 'Home Country',
    '3A5B': 'Home Postal Code',
    '3A5C': 'Home State/Provnce',
    '3A5D': 'Home Street',
    '3A5F': 'Other adr City',
    '3A60': 'Other adr Country',
    '3A61': 'Other adr PostCode',
    '3A62': 'Other adr Province',
    '3A63': 'Other adr Street',
    '3A64': 'Other adr PO box',

    '3FF7': 'Server (uncertain)',
    '3FF8': 'Creator1 (uncertain)',
    '3FFA': 'Creator2 (uncertain)',
    '3FFC': 'To email (uncertain)',
    '403D': 'To adrtype (uncertain)',
    '403E': 'To email (uncertain)',
    '5FF6': 'To (uncertain)'}

class Attachment:
    def __init__(self, msg, dir_):
        # Get long filename
        self.longFilename = msg._getStringStream([dir_, '__substg1.0_3707'])

        # Get short filename
        self.shortFilename = msg._getStringStream([dir_, '__substg1.0_3704'])

        # Get attachment data = msg._getStream([dir_, '__substg1.0_37010102'])

    def save(self):
        # Use long filename as first preference
        filename = self.longFilename
        # Otherwise use the short filename
        if filename is None:
            filename = self.shortFilename
        # Otherwise just make something up!
        if filename is None:
            import random
            import string
            filename = 'UnknownFilename ' + \
                ''.join(random.choice(string.ascii_uppercase + string.digits)
                        for _ in range(5)) + ".bin"
        f = open(filename, 'wb')
        return filename

class Message(OleFile.OleFileIO):
    def __init__(self, filename):
        OleFile.OleFileIO.__init__(self, filename)

    def _getStream(self, filename):
        if self.exists(filename):
            stream = self.openstream(filename)
            return None

    def _getStringStream(self, filename, prefer='unicode'):
        """Gets a string representation of the requested filename.
        Checks for both ASCII and Unicode representations and returns
        a value if possible.  If there are both ASCII and Unicode
        versions, then the parameter /prefer/ specifies which will be

        if isinstance(filename, list):
            # Join with slashes to make it easier to append the type
            filename = "/".join(filename)

        asciiVersion = self._getStream(filename + '001E')
        unicodeVersion = windowsUnicode(self._getStream(filename + '001F'))
        if asciiVersion is None:
            return unicodeVersion
        elif unicodeVersion is None:
            return asciiVersion
            if prefer == 'unicode':
                return unicodeVersion
                return asciiVersion

    def subject(self):
        return self._getStringStream('__substg1.0_0037')

    def header(self):
            return self._header
        except Exception:
            headerText = self._getStringStream('__substg1.0_007D')
            if headerText is not None:
                self._header = EmailParser().parsestr(headerText)
                self._header = None
            return self._header

    def date(self):
        # Get the message's header and extract the date
        if self.header is None:
            return None
            return self.header['date']

    def parsedDate(self):
        return email.utils.parsedate(

    def sender(self):
            return self._sender
        except Exception:
            # Check header first
            if self.header is not None:
                headerResult = self.header["from"]
                if headerResult is not None:
                    self._sender = headerResult
                    return headerResult

            # Extract from other fields
            text = self._getStringStream('__substg1.0_0C1A')
            email = self._getStringStream('__substg1.0_0C1F')
            result = None
            if text is None:
                result = email
                result = text
                if email is not None:
                    result = result + " <" + email + ">"

            self._sender = result
            return result

    def to(self):
            return self._to
        except Exception:
            # Check header first
            if self.header is not None:
                headerResult = self.header["to"]
                if headerResult is not None:
                    self._to = headerResult
                    return headerResult

            # Extract from other fields
            # TODO: This should really extract data from the recip folders,
            # but how do you know which is to/cc/bcc?
            display = self._getStringStream('__substg1.0_0E04')
            self._to = display
            return display

    def cc(self):
            return self._cc
        except Exception:
            # Check header first
            if self.header is not None:
                headerResult = self.header["cc"]
                if headerResult is not None:
                    self._cc = headerResult
                    return headerResult

            # Extract from other fields
            # TODO: This should really extract data from the recip folders,
            # but how do you know which is to/cc/bcc?
            display = self._getStringStream('__substg1.0_0E03')
            self._cc = display
            return display

    def body(self):
        # Get the message body
        return self._getStringStream('__substg1.0_1000')

    def attachments(self):
            return self._attachments
        except Exception:
            # Get the attachments
            attachmentDirs = []

            for dir_ in self.listdir():
                if dir_[0].startswith('__attach') and dir_[0] not in attachmentDirs:

            self._attachments = []

            for attachmentDir in attachmentDirs:
                self._attachments.append(Attachment(self, attachmentDir))

            return self._attachments

    def save(self, toJson=False, useFileName=False, raw=False):
        '''Saves the message body and attachments found in the message.  Setting toJson
        to true will output the message body as JSON-formatted text.  The body and
        attachments are stored in a folder.  Setting useFileName to true will mean that
        the filename is used as the name of the folder; otherwise, the message's date
        and subject are used as the folder name.'''

        if useFileName:
            # strip out the extension
            dirName = filename.split('/').pop().split('.')[0]
            # Create a directory based on the date and subject of the message
            d = self.parsedDate
            if d is not None:
                dirName = '{0:02d}-{1:02d}-{2:02d}_{3:02d}{4:02d}'.format(*d)
                dirName = "UnknownDate"

            if self.subject is None:
                subject = "[No subject]"
                subject = "".join(i for i in self.subject if i not in r'\/:*?"<>|')

            dirName = dirName + " " + subject

        def addNumToDir(dirName):
            # Attempt to create the directory with a '(n)' appended

            for i in range(2, 100):
                    newDirName = dirName + " (" + str(i) + ")"
                    return newDirName
                except Exception:
            return None

        except Exception:
            newDirName = addNumToDir(dirName)
            if newDirName is not None:
                dirName = newDirName
                raise Exception(
                    "Failed to create directory '%s'. Does it already exist?" %

        oldDir = os.getcwd()

            # Save the message body
            fext = 'json' if toJson else 'text'
            f = open("message." + fext, "w")
            # From, to , cc, subject, date

            def xstr(s):
                return '' if s is None else s.encode('utf-8')

            attachmentNames = []
            # Save the attachments
            for attachment in self.attachments:

                emailObj = {'from': xstr(self.sender),
                            'to': xstr(,
                            'cc': xstr(,
                            'subject': xstr(self.subject),
                            'date': xstr(,
                            'attachments': attachmentNames,
                            'body': decode_utf7(self.body)}

                f.write(json.dumps(emailObj, ensure_ascii=True))
                f.write("From: " + xstr(self.sender) + "\n")
                f.write("To: " + xstr( + "\n")
                f.write("CC: " + xstr( + "\n")
                f.write("Subject: " + xstr(self.subject) + "\n")
                f.write("Date: " + xstr( + "\n")


        except Exception:

            # Return to previous directory

    def saveRaw(self):
        # Create a 'raw' folder
        oldDir = os.getcwd()
            rawDir = "raw"
            sysRawDir = os.getcwd()

            # Loop through all the directories
            for dir_ in self.listdir():
                sysdir = "/".join(dir_)
                code = dir_[-1][-8:-4]
                global properties
                if code in properties:
                    sysdir = sysdir + " - " + properties[code]

                # Generate appropriate filename
                if dir_[-1].endswith("001E"):
                    filename = "contents.txt"
                    filename = "contents"

                # Save contents of directory
                f = open(filename, 'wb')

                # Return to base directory


    def dump(self):
        # Prints out a summary of the message
        print('Subject:', self.subject)

    def debug(self):
        for dir_ in self.listdir():
            if dir_[-1].endswith('001E'):  # FIXME: Check for unicode 001F too
                print("Directory: " + str(dir))
                print("Contents: " + self._getStream(dir))

    def save_attachments(self, raw=False):
        """Saves only attachments in the same folder.
        for attachment in self.attachments:

if __name__ == "__main__":
    if len(sys.argv) <= 1:
Launched from command line, this script parses Microsoft Outlook Message files
and save their contents to the current directory.  On error the script will
write out a 'raw' directory will all the details from the file, but in a
less-than-desirable format. To force this mode, the flag '--raw'
can be specified.

Usage:  <file> [file2 ...]
   or:  --raw <file>
   or:  --json

   to name the directory as the .msg file, --use-file-name

    writeRaw = False
    toJson = False
    useFileName = False

    for rawFilename in sys.argv[1:]:
        if rawFilename == '--raw':
            writeRaw = True

        if rawFilename == '--json':
            toJson = True

        if rawFilename == '--use-file-name':
            useFileName = True

        for filename in glob.glob(rawFilename):
            msg = Message(filename)
                if writeRaw:
          , useFileName)
            except Exception:
                # msg.debug()
                print("Error with file '" + filename + "': " +
ole-py is a Pypi module for OLE. It would normally be installed with a pip install ole-py. Have you tried that?

How did you get it to run and see the other errors if the ole import failed?
I don't know how to go about installing that via visual studio. I opened the file in visual studio and then clicked run, and those were all the problems that popped.
I don't know much about visual studio, but this seems to have instructions for adding packages to the python environment. Instead of matplotlib, try to install "ole-py".

Possibly Related Threads…
Thread Author Replies Views Last Post
  Invalid Date Format fo Cached Files jland47 1 226 May-22-2024, 07:04 PM
Last Post: deanhystad
  Compare current date on calendar with date format file name Fioravanti 1 385 Mar-26-2024, 08:23 AM
Last Post: Pedroski55
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 1,277 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Python date format changes to date & time 1418 4 814 Jan-20-2024, 04:45 AM
Last Post: 1418
  Filename stamp script is needed ineuw 11 4,854 Sep-12-2023, 03:05 AM
Last Post: ineuw
  Downloading time zone aware files, getting wrong files(by date))s tester_V 9 1,238 Jul-23-2023, 08:32 AM
Last Post: deanhystad
Question Need help for a python script to extract information from a list of files lephunghien 6 1,231 Jun-12-2023, 05:40 PM
Last Post: snippsat
  [SOLVED] Alternative to regex to extract date from whole timestamp? Winfried 6 2,005 Nov-16-2022, 01:49 PM
Last Post: carecavoador
  SQL Alchemy help to extract sql data into csv files mg24 1 1,941 Sep-30-2022, 04:43 PM
Last Post: Larz60+
  Extract parts of multiple log-files and put it in a dataframe hasiro 4 2,226 Apr-27-2022, 12:44 PM
Last Post: hasiro

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020