Python Forum
dictionaries and list as values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
dictionaries and list as values
#1
So I have an assignment where I have a sequence id's as a key in a dictionary and the values are lists of sequences of a certain length associated with a particular ID. What I need to do is to go through the sequences on the list (which is stored as the value), sort all those away that do not have a certain character as the nth character.

Then I need to get 500 random pair of the sequeces, it is important that a sequences is not paired with another sequence with the same sequence id (key).

I have never worked with this amount of data before, and is very unsure how to go about this.

I probably should look at object oriented programming but I have no idea where to start, can anyone give me any direction?

Thanks in advance!
Reply
#2
"What I need to do is to go through the sequences on the list (which is stored as the value), sort all those away that do not have a certain character as the nth character."

Easiest way is to start with dummy dictionary and test different approaches. It could be subset of real data or just:

>>> dummy = {1: ['a', 'b', 'c'], 2: ['a', 'c', 'b'], 3: ['c', 'b', 'a']}
Now you try to filter out keys which don't have 'b' in their values on index 1. Most used dictionary methods are:

>>> dummy.keys()
dict_keys([1, 2, 3])
>>> dummy.values()
dict_values([['a', 'b', 'c'], ['a', 'c', 'b'], ['c', 'b', 'a']])
>>> dummy.items()
dict_items([(1, ['a', 'b', 'c']), (2, ['a', 'c', 'b']), (3, ['c', 'b', 'a'])])
"it is important that a sequences is not paired with another sequence with the same sequence id (key)"

Maybe you elaborate that. Dictionary keys are unique and therefore situation with same keys cannot happen.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Thank you, I will try that approach Smile

What I mean by not having the same sequence id, is that I need to make sure that the random sequence pair
, the two sequences are not from the same list, if that makes sense. The purpose is to compare the sequences that belong to the different sequences id's.

Edit: However, I don't want to remove the key, just the those sequence in the list that doesn't fulfill the criteria, and keep the other sequences associated with that key. I am sorry if that wasn't clear.
Reply
#4
It easier to comprehend if you provide some real examples of key-value pairs and how expected outcome should look like.

Is it something like that:

>>> dummy = {1: ['aba', 'baa', 'cba'], 2: ['aaa', 'ccc', 'cbc']}
And the task is find strings in lists where 'b' on index 1 and make pairs of these strings without having two strings from same list as pair?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
So I have an dictionary that looks something like this:

Output:
{'1arcH': ['DKATIPSESPF', 'KATIPSESPFA', 'ATIPSESPFAA', 'TIPSESPFAAA']'1aabH': ['GVSGSCNIDVV', 'VSGSCNIDVVC', 'SGSCNIDVVCP', 'GSCNIDVVCPE']'1cevH': ['DISSTEIAVYW', 'ISSTEIAVYWG', 'SSTEIAVYWGQ', 'STEIAVYWGQR', 'TEIAVYWGQRE', 'EIAVYWGQRED']
For each of the values I only want the central character to be eg 'A', 'P' or 'V' so I get a updated dictionary that looks like this:

Output:
{'1arcH': ['DKATIPSESPF']'1aabH': ['GSCNIDVVCPE']'1cevH': ['SSTEIAVYWGQ', 'STEIAVYWGQR']
When that is done,, I want to be able randomly pair 'DKATIPSESPF' with another sequence from another list, now the sequence id is not important anymore so I imaging that this could be stored in a list or an array.

Thank you again!
Reply
#6
For 1arcH, why wouldn't KATIPSESPFA match? It contains both an A and a P.
Reply
#7
If decomposed to spoken language task can be represented with quite simple steps (it's more or less brute-force but according to Donald Knuth premature optimisation is root of almost all evil in programming):

- filter values
- create all pair combinations
- remove pair combinations created from values of one key

For trying out ideas Python interactive interpretator (or Jupyter) are always your best friends. So let's try to express the 'easy' steps in spoken language in Python. Following is one uninterrupted session (thus using _).

Filter values

We have dictionary, we have values (d.values()). We need filtered values (I assume that "the central character to be eg 'A', 'P' or 'V'" means that on index 5 there must be one of those letters). One obvious way is to use built-in functon filter, but it's out of fashion nowadeays so we use list comprehension - we will loop through value (list) of every key and for every key create list of elements which meet criteria.

In [1]: d = {'1arcH': ['DKATIPSESPF', 'KATIPSESPFA', 'ATIPSESPFAA', 'TIPSESPFAAA'], 
   ...:      '1aabH': ['GVSGSCNIDVV', 'VSGSCNIDVVC', 'SGSCNIDVVCP', 'GSCNIDVVCPE'], 
   ...:      '1cevH': ['DISSTEIAVYW', 'ISSTEIAVYWG', 'SSTEIAVYWGQ', 'STEIAVYWGQR', 'TEIAVYWGQRE', 'EIAVYWGQRED']}               

In [2]: [[sequence for sequence in value if sequence[5] in ['A', 'P', 'V']] for value in d.values()]                            
Out[2]: [['DKATIPSESPF'], [], ['SSTEIAVYWGQ', 'STEIAVYWGQR']]
We now have filtered values (note that we have empty list(s) for values where no matches were found)

Create all pair combinations

One way is to try construct all pairs by ourselves. However, there is built-in module itertools for efficient iterating which we could take advantage of. Specifically there are chain and combinations functions.

With chain we can chain all values into one iterable and with combinations we can get all pairs from that iterable.

In [4]: from itertools import chain, combinations                                                                               

In [3]: list(chain(*_))                                                                                                         
Out[3]: ['DKATIPSESPF', 'SSTEIAVYWGQ', 'STEIAVYWGQR']    # all filtered values

In [4]: list(combinations(_, 2))                                                                                                
Out[4]: 
[('DKATIPSESPF', 'SSTEIAVYWGQ'),                         # all pairs of filtered values
 ('DKATIPSESPF', 'STEIAVYWGQR'),
 ('SSTEIAVYWGQ', 'STEIAVYWGQR')]
Now we have all pair combinations.

Remove pair combinations created from values of one key

We need to know which pairs are generated from elements in value list of one key. We could use same technique as earlier: filter and create combinations, but this time only from values inside the list. We will chain these list right away:

In [5]: [[sequence for sequence in value if sequence[5] in ['A', 'P', 'V']] for value in d.values()]                            
Out[5]: [['DKATIPSESPF'], [], ['SSTEIAVYWGQ', 'STEIAVYWGQR']]

In [6]: list(chain(*(combinations(el, 2) for el in _)))                                                                         
Out[6]: [('SSTEIAVYWGQ', 'STEIAVYWGQR')]
Now we have list of pairs we want to filter out from all pairs. There are several techiques to do so and this related to parameters of task at hand. For example, can there be repeated pairs or all pairs are unique? Do we want retain repetitions or not?

If we have or want only unique values, then we can utilize built-in data structure/function set (which supports .difference method). If there are duplicates and we want to keep them then we can use list comprehension (or filter). There is however, one important thing - chain and combinations returning iterators which can be used/consumed only once. We must be extra careful not to consume them several times (or convert iterators to lists/tuples for use multiple times).


In [7]: all_combinations = [('DKATIPSESPF', 'SSTEIAVYWGQ'), 
   ...:                     ('DKATIPSESPF', 'STEIAVYWGQR'), 
   ...:                     ('SSTEIAVYWGQ', 'STEIAVYWGQR')]                                                                     

In [8]: inner_combinations = [('SSTEIAVYWGQ', 'STEIAVYWGQR')]                                                                  

In [9]: set(all_combinations).difference(inner_combinations)                    # only unique pairs not in inner pairs                                                
Out[9]: {('DKATIPSESPF', 'SSTEIAVYWGQ'), ('DKATIPSESPF', 'STEIAVYWGQR')}

In [10]: [pair for pair in all_combinations if pair not in inner_combinations]  # all pairs which are not in inner pairs                                                 
Out[10]: [('DKATIPSESPF', 'SSTEIAVYWGQ'), ('DKATIPSESPF', 'STEIAVYWGQR')]
Now we have set or list of pairs and we can utilize some function from random module to get required sample.

Full code is quite short (as task in spoken language, this is the beauty of Python):

from itertools import chain, combinations

d = {'1arcH': ['DKATIPSESPF', 'KATIPSESPFA', 'ATIPSESPFAA', 'TIPSESPFAAA'],
     '1aabH': ['GVSGSCNIDVV', 'VSGSCNIDVVC', 'SGSCNIDVVCP', 'GSCNIDVVCPE'],
     '1cevH': ['DISSTEIAVYW', 'ISSTEIAVYWG', 'SSTEIAVYWGQ', 'STEIAVYWGQR', 'TEIAVYWGQRE', 'EIAVYWGQRED']}

values = [[sequence for sequence in value if sequence[5] in ['A', 'P', 'V']] for value in d.values()]

flat = chain(*values)
all_combinations = combinations(flat, 2)
inner_combinations = chain(*(combinations(el, 2) for el in values))
set(all_combinations).difference(inner_combinations)

# output
{('DKATIPSESPF', 'SSTEIAVYWGQ'), ('DKATIPSESPF', 'STEIAVYWGQR')}
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need help comparing totals from list of dictionaries AnOddGirl 1 1,581 Mar-18-2020, 01:17 AM
Last Post: AnOddGirl
  Turtle Graphics Card Values in a List Muzz 0 2,368 Apr-11-2019, 12:55 PM
Last Post: Muzz
  making a dictionary from a list, one key with multiple values in a list within a list rhai 4 3,665 Oct-24-2018, 06:40 PM
Last Post: LeSchakal
  Populating a list with Dictionaries.... SteenRudberg 4 2,719 Sep-26-2018, 11:09 AM
Last Post: SteenRudberg
  Issues with Inserting Values into an Empty List with a While Loop TommyMer 2 3,805 Sep-12-2018, 12:43 AM
Last Post: TommyMer
  Storing Minimum List of values from a recursive function sigsegv22 1 2,567 Sep-10-2018, 01:25 PM
Last Post: ichabod801
  Adding values to list and pickling mefiak 2 2,865 May-31-2018, 08:57 AM
Last Post: mefiak
  "List index out of range" for output values pegn305 3 5,353 Nov-26-2017, 02:20 PM
Last Post: heiner55
  Newbie to Python - Problem in accessing Dictionaries and List sambill 1 3,091 Aug-17-2017, 07:38 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020