proteins interactions - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: proteins interactions (/thread-18571.html) |
proteins interactions - Amniote - May-22-2019 Hello everyone, I need to create protein interaction chains (composed of 10 proteins) To do this, I created a dictionary in this form: Protein 1 = protein2, protein5, protein6 (protein 1 can interact with protein 2 or 5 or 6). Protein 2 = protein1, protein7, protein8 (Protein 2 can interact with Protein 1, 7 or 8). ... Protein 7 = protein34, protein43 ... Protein 43 = protein 74, protein 76 ect ... (I have over 20000 proteins) I must ultimately have all the possibilities of chains of interactions composed of 10 proteins, for example: P1 P2 P7 P43 P74 ... P1 P5 ... P1 P6 ... P2 P1 ... P2 P7 P34 ... ect... I have no idea how to code an algorithm for this, can you help me? thank you RE: proteins interactions - micseydel - May-22-2019 Can you give simplified code examples of what your input and output should be like? And what part you're struggling with in particular? RE: proteins interactions - Amniote - May-23-2019 Here is an excerpt from my file where the interactions between two proteins are indicated: NDUFAF7 NDUFS7 126 0 0 112 119 900 708 976 ---> EXP NDUFAF7 NDUFS2 295 0 0 93 50 900 913 993 ---> EXP NDUFAF7 NDUFS3 222 0 216 86 50 900 616 974 ---> EXP FUCA2 FUCA1 0 0 60 64 187 900 72 921 ---> EXP HS3ST1 GPC6 0 0 0 0 96 900 439 944 HS3ST1 GPC3 0 0 0 0 96 900 458 946 ARF5 COPE 0 0 0 191 160 900 83 929 ---> EXP ARF5 DCTN1 0 0 0 95 59 900 69 910 ---> EXP M6PR LRP2 0 0 0 0 320 900 257 945 Here for example FUCA2 interacts with FUCA1. Here is the script I have for the moment: from collections import defaultdict dico_prot1_prot2 = defaultdict(set) with open("C:/Users/lveillat/Desktop/Données stage/Données/resultats_matrice_avec_scores_sans_localisation.txt","r") as f1: for ligne in f1: lp = ligne.rstrip('\n').split(" ") if lp[-1] == "EXP": prot1 = lp[0] prot2 = lp[1] dico_prot1_prot2[prot1].add(prot2) with open("chainespostfiltrage.tsv","w") as f2: for prot1 in dico_prot1_prot2: #I walk through each proteins tmpchaine = set() # I initialize my protein chain with the first protein tmpchaine.add(prot1) for prot2 in dico_prot1_prot2: #I go again my dico if prot1 != prot2 and prot1 in dico_prot1_prot2[prot2]: #If the two proteins are different and there is interaction if len(tmpchaine) < 10: #If the length of the chain is less than 10 proteins tmpchaine.add(prot2) #Then you add the protein that has an interaction in the chain elif len(tmpchaine) == 10: #If the chain contains 10 proteins chaine = " ".join(tmpchaine) f2.write(chaine+"\n") print(chaine) tmpchaine = set() #I empty the chain because it has reached its desired size (10 proteins)Here is finally the output of my script : unfortunately, even if I want results in this form, it lacks a lot of interaction channels and I do not know how to fix that.Thanks for your help |