Searching for nested items within the dictionary data structure

rob101

I've been studying the dictionary data structure as I wanted to discover a way of searching for nested items, so this code is a demonstration of that goal; I understand that it's possibly over engineered for the task that I have chosen.

The user input is not fully sanitized, but (as you will see from the code comments) I have that covered by a custom function that I've already written (for the sake of brevity, that function is not included here).

Also, (for the sake of brevity) I've only included a few books, but you can add as many as you like, for testing.

I know not of any bugs, so if you find any or if you have any general comments about my coding style, I'm open to constructive criticism.

Thank you reading and testing; I'll reply to any comments you may have, as and when.

Enjoy and who knows, you may even find this to be the bases of a useful app.

#!/usr/bin/python3

from sys import stdout

library = { # list indexed as 0 for the book title and 1 for the book author
        'computer science & programming':{
            '0-13-110163-3':[
                'THE C PROGRAMMING LANGUAGE',
                'BRIAN W.KERNIGHAN & DENNIS M.RITCHIE'
                ],
            '0-85934-229-8':[
                'PROGRAMMING IN QuickBASIC',
                'N.KANTARIS'
                ],
            '0-948517-48-4':[
                'HiSoft BASIC VERSION 2: USER MANUAL',
                'DAVID NUTKINS, ALEX KIERNAN and TONY KENDLE'
                ]
            },
        'reference':{
            '0-333-34806-0':[
                'DICTIONARY OF INFORMATION TECHNOLOGY',
                'DENNIS LONGLEY and MICHAEL SHAIN'
                ]
            },
        'novels':{
            '0-681-40322-5':[
                'THE MORE THAN COMPLETE HITCHHIKER\'S GUIDE',
                'DOUGLAS ADAMS'
                ]
            }
    }
#===========<End of dictionary>===========#
def search(publication, term):
    result = []
    results = []
    found = 0
    maximum = 6
    title = 0
    author = 1
    categories = library.keys()
    for category in library:
        for isbn in library[category]:
            book = library.get(category).get(isbn)
            if term[:4] == 'ISBN' and term[4:] == isbn:
                term = book[title]
            if term in book[publication]:
                found += 1
                result.append(book[title])
                result.append(book[author])
                result.append(category)
                result.append(isbn)
                results.append(result)
                result = []
                if found > maximum:
                    break
        if found > maximum:
            break
    if found:
        if found > maximum:
            return maximum
        else:
            return results
    else:
        return
#=========<End of search function>=========#
def output(results, file=stdout):
    print("-"*50)
    if isinstance(results, list):
        for books in results:
            for book in books:
                print(book)
            print("-"*50)
    else:
        print("Search results exceeds the maximum of {}".format(results))
        print("-"*50)
#=========<End of output function>=========#
find, found = None, None
 
# attempt = the index reference passed to <if term in book[publication]>
attempt = 0 # 0 = book title 1 = book author

quit = False
while not find and not quit:
    print('''
Search term must be alphanumeric characters only
and greater than three characters in length.

For a ISBN search, enter ISBN and press return.
    ''')
    find = input("Search term: ").strip().upper() # to-do: check the input with the user input checker function
    if find == 'QUIT':
        quit = True
    elif find =='ISBN':
        print("ISBN search")
        isbn = input("ISBN: ").strip()
        find = find+isbn
    if len(find) > 3:
        found = search(attempt, find)
    else:
        find, found = None, None
    if find and not quit:
        while not found and attempt < 1: # change this if more fields are added to the publication
            attempt +=1
            found = search(attempt, find)
    if found:
        output(found)
        find, found = None, None
        attempt = 0
    elif not quit:
        print("Nothing found")
        find = None
        attempt = 0

print("Search exit.")

**Gribouillis** · Aug-30-2022, 09:12 AM

(Aug-30-2022, 08:55 AM)rob101 Wrote: Thank you reading and testing; I'll reply to any comments you may have, as and when.

I'll try to look into this. A few remarks while reading:

Write unit tests to automate testing while developing the code (the most useful thing).
Add a "file" parameter to output(), defaulting to sys.stdout (better for functions that do input/output)
Use triple quotes to define multiline strings.

rob101 · (This post was last modified: Aug-30-2022, 11:12 AM by rob101.)

(Aug-30-2022, 09:12 AM)Gribouillis Wrote: I'll try to look into this. A few remarks while reading:
Write unit tests to automate testing while developing the code (the most useful thing).

Add a "file" parameter to output(), defaulting to sys.stdout (better for functions that do input/output)

Use triple quotes to define multiline strings.

Thank you. I will update the code (above) in one hit, as and when any feedback that requires a code update, seems to be in.

I have to admit (and as you've likely guessed) I'm not up to speed with your point 1 and point 2. As for point 3, yes; that's something that I should have taken care of and it will be.

Point 1: By this, do you mean that I should have a 'driver' to simulate user input or am I barking up the wrong tree?
Point 2: Could you (if you've time) give me a quick explainer as to why this is a good option to have and how that could be used?
Function will be amended to: def output(results, file=sys.stdout) which is (as I understand it to be) the default.
Point 3: Done. The code here will be updated as and when.

With thanks and regards.

**Gribouillis** · (This post was last modified: Aug-30-2022, 12:54 PM by Gribouillis.)

(Aug-30-2022, 11:12 AM)rob101 Wrote: Point 1: By this, do you mean that I should have a 'driver' to simulate user input or am I barking up the wrong tree?

In the end, it could be an option, but unit tests are made to test small «units» in a program, not the program as a whole. For example they test a function's behavior. Here is how you could start unit testing the output() function for example. I inserted the following code just before the find, found = None, None in your code

import io
import unittest

class TestOutput(unittest.TestCase):

    def test_print_error_message_if_results_is_integer(self):
        results = 25
        ofh = io.StringIO()
        output(results, file=ofh)
        s = ofh.getvalue()
        self.assertIn(f'exceeds the maximum of {results}', s)

    def test_output_contains_titles(self):
        results = [['ti0, ''au0', 'ca0', 'is0'], ['ti1', 'au1', 'ca1', 'is1']]
        ofh = io.StringIO()
        output(results, file=ofh)
        s = ofh.getvalue()
        self.assertIn('ti0', s)
        self.assertIn('ti1', s)

if sys.argv[-1] == 'test':
    unittest.main(argv=sys.argv[:-1])
    sys.exit(0)

Now if instead of python program.py, you call python program.py test, it will run the tests instead of an interactive session.

To make the output() function testable, I had to inject the file in its parameters, and this answer your second question: to make output function testable, you need to be able to inject the file object. I did it in a simple way here

import functools
import sys

def output(results, file=sys.stdout):
    print = functools.partial(__builtins__.print, file=file)
    print("-"*50)
    if type(results) is not int:
        for books in results:
            for book in books:
                print(book)
            print("-"*50)
    else:
        print("Search results exceeds the maximum of {}".format(results))
        print("-"*50)
#=========<End of output function>=========#

Output:λ python paillasse/pf/rob101.py test
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

rob101 Wrote:Point 3: Done. The code here will be updated as and when.

You could perhaps upload the code to a site such as github gist which allows you to push updates of the code throw git like I did for this module for example, and leave a link in this thread so we could have the latest version at any time.

rob101

(Aug-30-2022, 12:54 PM)Gribouillis Wrote: In the end, it could be an option, but unit tests are made to test small «units» in a program, not the program as a whole.

This is all very helpful and I need to take a little time so that I can get my head around these new (to me) concepts and evaluate the code that you have posted, so that I fully understand what you've done, as well as why.

(Aug-30-2022, 12:54 PM)Gribouillis Wrote: You could perhaps upload the code to a site such as github gist...

This is an option that I will look into. In the mean time, I will update the code that's in my first post: I feel that it's maybe better to do that, than to have multiple versions sprinkled around this thread.

Given that I have the output() function and that it can be 'unit tested' in the way that you demonstrate, I feel it could be better to have all the print() functions moved to the output() function, right? That is to say, the ones that are concerned with the search results, such as Nothing found

A thought that's come to mind, as I type this: once testing has been done, is it 'best practice' to remove the code that facilitates said testing, or does one leave it as is? I feel it should be removed, as it plays no part in the functionality of the app, right? It's details such as this, that are of as much interest to me, as is writing the code.

With that last thought in mind, I will refrain from including any of the code that is purely for testing, until I'm clear about what should and should not be included in the, shall we call it, release candidate.

Thank you very much for your time, as well as the information, and I look forward to your next reply, as and when you have more time to do so.

**perfringo** · Aug-30-2022, 02:17 PM

This is too verbose:

categories = library.keys()

for category in categories:
        books = library.get(category)
        for isbn in books:
            book = books.get(isbn)
            # do something with book

If you iterate over dictionary then you iterate over keys. So you can reduce this to:

for category in library:
    for record in library[category]:
        # do something with library[category][record]

Which raises the question about the way the data is structured. If I get some data from upstream my first action is to check whether I should convert it to make it simpler (and faster) to work with. In this particular case list of dictionaries could be one possibility - very simple and generic filtering function could deliver all required functionality. Current code iterates over all the data, so there should not be any performance penalty as well. Another possibility is to use dataframe and take advantage of vectorization.

rob101 · (This post was last modified: Aug-30-2022, 04:04 PM by rob101.)

(Aug-30-2022, 02:17 PM)perfringo Wrote: This is too verbose:...

... Which raises the question about the way the data is structured.

Yes, it does. I'm not one for any nested data structured, if it can be avoided, but keep in mind that this is an academic exercise for me, just because I wanted to learn how one would go about searching such a data structure, if one needed to. If I was implementing a way to store and search a book collection, I would not use this code, as there are much simpler ways in which that can be done.

I will have a look at the improvement that you've posted, for which I am grateful, as I'm sure it will be better and I will be able to apply what you've shown me.

With thanks and regards.

To add...

(Aug-30-2022, 02:17 PM)perfringo Wrote: If you iterate over dictionary then you iterate over keys. So you can reduce this to:
for category in library:
    for record in library[category]:   # do something with library[category][record]

I've run a test and from what I can see, your improvement will work for me:

for category in library:
        for isbn in library[category]:
            book = library.get(category).get(isbn)
            if term[:4] == 'ISBN' and term[4:] == isbn:
           # do the rest of the search from here

... which I will implement and update (unforeseen issues aside).

Thank you.

**Larz60+** · Aug-30-2022, 08:13 PM

For what it's worth, Here's how I generically display nested dictionaries.

def display_dict(dictname, indentwidth=0):
    indent = " " * (4 * indentwidth)
    for key, value in dictname.items():
        if isinstance(value, dict):
            print(f'\n{indent}{key}')
            indentwidth += 1
            display_dict(value, indentwidth)
        else:
            print(f'{indent}{key}: {value}')
        if indentwidth > 0:
            indentwidth -= 1

def testit():

    urllist = {
        "LocalGovernment": {
            "Argentina": {
                "MisionesOpenData_AR": "http://www.datos.misiones.gov.ar/"
            },
            "Austria": {
                "ViennaOpenData_AT": "https://www.data.gv.at/"
            },
            "UnitedStates": {
                "alabama": {
                    "Alabaster": {
                        "Rank": 16,
                        "URL": "https://www.cityofalabaster.com/",
                        "Population": "33,373"
                    },
                    "Albertville": {
                        "Rank": 27,
                        "URL": "https://www.cityofalbertville.com/",
                        "Population": "21,620"
                    }
                }
            }
        }
    }

    display_dict(urllist)

if __name__ == '__main__':
    testit()

Output:LocalGovernment

    Argentina
        MisionesOpenData_AR: http://www.datos.misiones.gov.ar/

    Austria
        ViennaOpenData_AT: https://www.data.gv.at/

    UnitedStates

        alabama

            Alabaster
                Rank: 16
                URL: https://www.cityofalabaster.com/
                Population: 33,373

            Albertville
                Rank: 27
                URL: https://www.cityofalbertville.com/
                Population: 21,620

**Gribouillis** · Aug-30-2022, 08:45 PM

(Aug-30-2022, 08:13 PM)Larz60+ Wrote: Here's how I generically display nested dictionaries.

Or using module asciitree from Pypi

import asciitree


class OurTraversal(asciitree.traversal.Traversal):
    def get_children(self, node):
        k, v = node
        return list(v.items()) if isinstance(v, dict) else []

    def get_root(self, tree):
        return tree

    def get_text(self, node):
        k, v = node
        return k if isinstance(v, dict) else f'{k}: {v}'


def testit():

    urllist = {
        "LocalGovernment": {
            "Argentina": {
                "MisionesOpenData_AR": "http://www.datos.misiones.gov.ar/"
            },
            "Austria": {
                "ViennaOpenData_AT": "https://www.data.gv.at/"
            },
            "UnitedStates": {
                "alabama": {
                    "Alabaster": {
                        "Rank": 16,
                        "URL": "https://www.cityofalabaster.com/",
                        "Population": "33,373"
                    },
                    "Albertville": {
                        "Rank": 27,
                        "URL": "https://www.cityofalbertville.com/",
                        "Population": "21,620"
                    }
                }
            }
        }
    }

    s = str(asciitree.LeftAligned(traverse=OurTraversal())(('', urllist)))
    print(s)

if __name__ == '__main__':
    testit()

Output: +-- LocalGovernment
     +-- Argentina
     |   +-- MisionesOpenData_AR: http://www.datos.misiones.gov.ar/
     +-- Austria
     |   +-- ViennaOpenData_AT: https://www.data.gv.at/
     +-- UnitedStates
         +-- alabama
             +-- Alabaster
             |   +-- Rank: 16
             |   +-- URL: https://www.cityofalabaster.com/
             |   +-- Population: 33,373
             +-- Albertville
                 +-- Rank: 27
                 +-- URL: https://www.cityofalbertville.com/
                 +-- Population: 21,620

rob101 · Aug-30-2022, 08:57 PM

(Aug-30-2022, 08:13 PM)Larz60+ Wrote: For what it's worth, Here's how I generically display nested dictionaries.

Thank you for that.

This...

for key, value in dictname.items():
        if isinstance(value, dict):

... looks very interesting. I'd not considered accessing the keys directly, in a for loop, with the .items() method, together with the isinstance() function. I'll certainly look at that usage, for my own understanding, least ways.

Searching for nested items within the dictionary data structure

User Panel Messages

Announcements