bionetgen.atomizer.atomizer package

Submodules

bionetgen.atomizer.atomizer.analyzeRDF module

bionetgen.atomizer.atomizer.analyzeSBML module

Created on Thu Mar 22 13:11:38 2012

@author: proto

class bionetgen.atomizer.atomizer.analyzeSBML.SBMLAnalyzer(modelParser, configurationFile, namingConventions, speciesEquivalences=None, conservationOfMass=True)[source]

Bases: object

analyzeSpeciesModification(baseElement, modifiedElement, partialAnalysis)[source]: a method for trying to read modifications within complexes This is only possible once we know their internal structure (this method is called after the creation and resolving of the dependency graph)

analyzeSpeciesModification2(baseElement, modifiedElement, partialAnalysis)[source]: A method to read modifications within complexes.

analyzeUserDefinedEquivalences(molecules, conventions)[source]

approximateMatching(ruleList, differenceParameter=[])[source]

approximateMatching2(reactantString, productString, strippedMolecules, differenceParameter)[source]: The meat of the naming convention matching between reactant and product is done here tl;dr naming conventions are hard

breakByActionableUnit(reaction, strippedMolecules)[source]

checkCompliance(ruleCompliance, tupleCompliance, ruleBook)[source]: This method is mainly useful when a single reaction can be possibly classified in different ways, but in the context of its tuple partners it can only be classified as one

classifyReactions(reactions, molecules, externalDependencyGraph={})[source]

classifies a group of reaction according to the information in the json config file

FIXME:classifiyReactions function is currently the biggest bottleneck in atomizer, taking up to 80% of the time without counting pathwaycommons querying.

classifyReactionsWithAnnotations(reactions, molecules, annotations, labelDictionary)[source]: this model will go through the list of reactions and assign a ‘modification’ tag to those reactions where some kind of modification goes on aided through annotation information

compareStrings(reactant, product, strippedMolecules)[source]

distanceToModification(particle, modifiedElement, translationKeys)[source]

findBiggestActionable(chemicalList, chemicalCandidatesList)[source]

findClosestModification(particles, species, annotationDict, originalDependencyGraph)[source]: maps a set of particles to the complete set of species using lexical analysis. This step is done independent of the reaction network.

findMatchingModification(particle, species)[source]

fuzzyArtificialReaction(baseElements, modifiedElement, molecules)[source]: in case we don’t know how a species is composed but we know its base elements, try to get it by concatenating its basic reactants

getReactionClassification(reactionDefinition, rules, equivalenceTranslator, indirectEquivalenceTranslator, translationKeys=[])[source]: reactionDefinition is a list of conditions that must be met for a reaction to be classified a certain way rules is the list of reactions equivalenceTranslator is a dictinary containing all complexes that have been determined to be the same through naming conventions This method will go through the list of rules and the list of rule definitions and tell us which rules it can classify according to the rule definitions list provided

getReactionProperties()[source]: if we are using a naming convention definition in the json file this method will return the component and state names that this reaction uses

getUserDefinedComplexes()[source]

greedyModificationMatching(speciesString, referenceSpecies)[source]: recursive function trying to map a given species string to a string permutation of the strings in reference species >>> sa = SBMLAnalyzer(None,’./config/reactionDefinitions.json’,’./config/namingConventions.json’) >>> sorted(sa.greedyModificationMatching(‘EGF_EGFR’,[‘EGF’,’EGFR’])) [‘EGF’, ‘EGFR’] >>> sorted(sa.greedyModificationMatching(‘EGF_EGFR_2_P_Grb2’,[‘EGF’,’EGFR’,’EGF_EGFR_2_P’,’Grb2’])) [‘EGF_EGFR_2_P’, ‘Grb2’] >>> sorted(sa.greedyModificationMatching(‘A_B_C_D’,[‘A’,’B’,’C’,’C_D’,’A_B_C’,’A_B’])) [‘A_B’, ‘C_D’]

growString(reactant, product, rp, pp, idx, strippedMolecules, continuityFlag)[source]: currently this is the slowest method in the system because of all those calls to difflib

identifyReactions2(rule, reactionDefinition)[source]: This method goes through the list of common reactions listed in ruleDictionary and tries to find how are they related according to the information in reactionDefinition

levenshtein(s1, s2)[source]

loadConfigFiles(fileName)[source]: the reactionDefinition file must contain the definitions of the basic reaction types we wnat to parse and what are the requirements of a given reaction type to be considered as such

processAdHocNamingConventions(reactant, product, localSpeciesDict, compartmentChangeFlag, moleculeSet)[source]

1-1 string comparison. This method will attempt to detect if there’s a modifiation relatinship between string <reactant> and <product>

>>> sa = SBMLAnalyzer(None,'./config/reactionDefinitions.json','./config/namingConventions.json')
>>> sa.processAdHocNamingConventions('EGF_EGFR_2','EGF_EGFR_2_P', {}, False, ['EGF','EGFR', 'EGF_EGFR_2'])
[[[['EGF_EGFR_2'], ['EGF_EGFR_2_P']], '_p', ('+ _', '+ p')]]
>>> sa.processAdHocNamingConventions('A', 'A_P', {}, False,['A','A_P']) #changes neeed to be at least 3 characters long
[[[['A'], ['A_P']], None, None]]
>>> sa.processAdHocNamingConventions('Ras_GDP', 'Ras_GTP', {}, False,['Ras_GDP','Ras_GTP', 'Ras'])
[[[['Ras'], ['Ras_GDP']], '_gdp', ('+ _', '+ g', '+ d', '+ p')], [[['Ras'], ['Ras_GTP']], '_gtp', ('+ _', '+ g', '+ t', '+ p')]]
>>> sa.processAdHocNamingConventions('cRas_GDP', 'cRas_GTP', {}, False,['cRas_GDP','cRas_GTP'])
[[[['cRas'], ['cRas_GDP']], '_gdp', ('+ _', '+ g', '+ d', '+ p')], [[['cRas'], ['cRas_GTP']], '_gtp', ('+ _', '+ g', '+ t', '+ p')]]

processAnnotations(molecules, annotations)[source]

processFuzzyReaction(reaction, translationKeys, conventionDict, indirectEquivalenceTranslator)[source]

processNamingConventions2(molecules, threshold=4, onlyUser=False)[source]

removeExactMatches(reactantList, productList)[source]: goes through the list of lists reactantList and productList and removes the intersection

setConfigurationFile(configurationFile)[source]

species2Rules(rules)[source]: This method goes through the rule list and classifies species tuples in a dictionary according to the reactions they appear in.

testAgainstExistingConventions(fuzzyKey, modificationList, threshold=4)[source]

userJsonToDataStructure(patternName, userEquivalence, dictionary, labelDictionary, equivalencesList)[source]: converts a user defined species to an internal representation

bionetgen.atomizer.atomizer.analyzeSBML.addToDependencyGraph(dependencyGraph, label, value)[source]

bionetgen.atomizer.atomizer.analyzeSBML.get_close_matches(match, dataset, cutoff=0.6)[source]

bionetgen.atomizer.atomizer.analyzeSBML.parseReactions(reaction, specialSymbols='')[source]

bionetgen.atomizer.atomizer.analyzeSBML.sequenceMatcher(a, b)[source]: compares two strings ignoring underscores

bionetgen.atomizer.atomizer.atomizationAux module

exception bionetgen.atomizer.atomizer.atomizationAux.CycleError(memory)[source]

Bases: Exception

Exception raised for errors in the input.

Attributes:: expr – input expression in which the error occurred msg – explanation of the error

bionetgen.atomizer.atomizer.atomizationAux.addAssumptions(assumptionType, assumption, assumptions)[source]

bionetgen.atomizer.atomizer.atomizationAux.addToDependencyGraph(dependencyGraph, label, value)[source]

bionetgen.atomizer.atomizer.atomizationAux.getAnnotations(annotation)[source]: parses a libsbml.XMLAttributes annotation object into a list of annotations

bionetgen.atomizer.atomizer.atomizationAux.getURIFromSBML(moleculeName, parser, filterString=None)[source]: filters a list of URI’s so that we get only filterString ID’s

bionetgen.atomizer.atomizer.atomizationAux.parseReactions(reaction)[source]: given a reaction string definition it separates the elements into reactants and products >>> parseReactions(‘A() + B() -> C() k1()’) [[‘A’, ‘B’], [‘C’]] >>> parseReactions(‘A()@EC + B()@PM -> C()@PM k1()’) [[‘A’, ‘B’], [‘C’]] >>> parseReactions(‘0 -> A() k1()’) [‘0’, [‘A’]]

bionetgen.atomizer.atomizer.atomizerUtils module

exception bionetgen.atomizer.atomizer.atomizerUtils.BindingException(value, combinations)[source]: Bases: Exception

bionetgen.atomizer.atomizer.detectOntology module

Created on Sat Oct 19 15:19:35 2013

@author: proto

bionetgen.atomizer.atomizer.detectOntology.analyzeNamingConventions(speciesName, ontologyFile, ontologyDictionary={}, similarityThreshold=4)[source]

bionetgen.atomizer.atomizer.detectOntology.databaseAnalysis(directory, outputFile)[source]

bionetgen.atomizer.atomizer.detectOntology.defineEditDistanceMatrix(speciesName, similarityThreshold=4, parallel=False)[source]: obtains a distance matrix and a pairs of elements that are close in distance, along with the proposed differences

bionetgen.atomizer.atomizer.detectOntology.defineEditDistanceMatrix3(speciesName, similarityThreshold=4, parallel=False)[source]

bionetgen.atomizer.atomizer.detectOntology.findLongestSubstring(speciesA, speciesB)[source]

bionetgen.atomizer.atomizer.detectOntology.getDifferences(scoreMatrix, speciesName, threshold)[source]: given a list of strings and a scoreMatrix, return the list of difference between those strings with a levenshtein difference of less than threshold returns:

namePairs: list of tuples containing strings with distance <2 differenceList: list of differences between the tuples in namePairs

bionetgen.atomizer.atomizer.detectOntology.levenshtein(s1, s2)[source]

bionetgen.atomizer.atomizer.detectOntology.loadOntology(ontologyFile)[source]

bionetgen.atomizer.atomizer.detectOntology.main(fileName)[source]

bionetgen.atomizer.atomizer.detectOntology.stringToSet(species, idx, scoreRow, speciesName)[source]

bionetgen.atomizer.atomizer.moleculeCreation module

Created on Tue Apr 2 21:06:43 2013

@author: proto

bionetgen.atomizer.atomizer.moleculeCreation.addBondToComponent(species, moleculeName, componentName, bond, priority=1)[source]

bionetgen.atomizer.atomizer.moleculeCreation.addComponentToMolecule(species, moleculeName, componentName)[source]

bionetgen.atomizer.atomizer.moleculeCreation.addStateToComponent(species, moleculeName, componentName, state)[source]

bionetgen.atomizer.atomizer.moleculeCreation.atomize(dependencyGraph, weights, translator, reactionProperties, equivalenceDictionary, bioGridFlag, sbmlAnalyzer, database, parser)[source]: The atomizer’s main methods. Receives a dependency graph

bionetgen.atomizer.atomizer.moleculeCreation.createBindingRBM(element, translator, dependencyGraph, bioGridFlag, pathwaycommonsFlag, parser, database)[source]

bionetgen.atomizer.atomizer.moleculeCreation.createCatalysisRBM(dependencyGraph, element, translator, reactionProperties, equivalenceDictionary, sbmlAnalyzer, database)[source]: if it’s a catalysis reaction create a new component/state

bionetgen.atomizer.atomizer.moleculeCreation.createEmptySpecies(name)[source]

bionetgen.atomizer.atomizer.moleculeCreation.getBondNumber(molecule1, molecule2)[source]: keeps a model-level registry of of all the molecule pairs and returns a unique index

bionetgen.atomizer.atomizer.moleculeCreation.getComplexationComponents2(moleculeName, species, bioGridFlag, pathwaycommonsFlag=False, parser=None, bondSeeding=[], bondExclusion=[], database=None)[source]: method used during the atomization process. It determines how molecules in a species bind together

bionetgen.atomizer.atomizer.moleculeCreation.getTrueTag(dependencyGraph, molecule)[source]: given any modified or basic element it returns its basic name

bionetgen.atomizer.atomizer.moleculeCreation.identifyReaction(equivalenceDictionary, originalElement, modifiedElement)[source]

bionetgen.atomizer.atomizer.moleculeCreation.isInComplexWith(moleculeSet, parser=None)[source]: given a list of binding candidates, it gets the uniprot ID from annotation and queries the pathway commons class to see if there’s known binding information for those two

bionetgen.atomizer.atomizer.moleculeCreation.propagateChanges(translator, dependencyGraph)[source]

bionetgen.atomizer.atomizer.moleculeCreation.sanityCheck(database)[source]: checks for critical atomization errors like isomorphism

bionetgen.atomizer.atomizer.moleculeCreation.solveComplexBinding(totalComplex, pathwaycommonsFlag, parser, compositionEntry)[source]: given two binding complexes it will attempt to find the ways in which they bind using different criteria

bionetgen.atomizer.atomizer.moleculeCreation.transformMolecules(parser, database, configurationFile, namingConventions, speciesEquivalences=None, bioGridFlag=False, memoizedResolver=True)[source]: main method. Receives a parser configuration, a configurationFile and a list of user defined species equivalences and returns a dictionary containing an atomized version of the model Args:

parser: data structure containing the reactions and species we will use database: data structure containining the result of the outgoing translation configurationFile: speciesEquivalences: predefined species

bionetgen.atomizer.atomizer.moleculeCreation.updateSpecies(species, referenceMolecule)[source]

bionetgen.atomizer.atomizer.resolveSCT module

class bionetgen.atomizer.atomizer.resolveSCT.SCTSolver(database, memoizedResolver=False)[source]

Bases: object

bindingReactionsAnalysis(dependencyGraph, reaction, classification)[source]

adds addBond based reactions based dependencies to the dependency graph

>>> dg = dg2 = {}
>>> dummy = SCTSolver(None)
>>> dummy.bindingReactionsAnalysis(dg, [['A', 'B'], ['C']], 'Binding')
>>> dg == {'A': [], 'C': [['A', 'B']], 'B': []}
True
>>> dummy.bindingReactionsAnalysis(dg2, [['C'], ['A', 'B']], 'Binding')
>>> dg2 == {'A': [], 'C': [['A', 'B']], 'B': []}
True

consolidateDependencyGraph(dependencyGraph, equivalenceTranslator, equivalenceDictionary, sbmlAnalyzer, loginformation=True)[source]: The second part of the Atomizer algorithm, once the lexical and stoichiometry information has been extracted it is time to state all elements of the system in unequivocal terms of their molecule types

createSpeciesCompositionGraph(parser, configurationFile, namingConventions, speciesEquivalences=None, bioGridFlag=False)[source]

Main method for the SCT creation.

It first does stoichiometry analysis, then lexical…

fillSCTwithAnnotationInformation(orphanedSpecies, annotationDict, logResults=True, tentativeFlag=False)[source]

make_key_from_graph(graph)[source]

measureGraph(element, path)[source]: Calculates the weight of individual paths as the sum of the weights of the individual candidates plus the number of candidates. The weight of an individual candidate is equal to the sum of strings contained in that candidate different from the original reactant >>> dummy = SCTSolver(None) >>> dummy.measureGraph(‘Trash’,[‘0’]) 1 >>> dummy.measureGraph(‘EGF’,[[‘EGF’]]) 2 >>> dummy.measureGraph(‘EGFR_P’,[[‘EGFR’]]) 3 >>> dummy.measureGraph(‘EGF_EGFR’, [[‘EGF’, ‘EGFR’]]) 4 >>> dummy.measureGraph(‘A_B_C’,[[‘A’, ‘B_C’], [‘A_B’, ‘C’]]) 7

measureGraph2(element, path)[source]: Identical to previous function but iterative instead of recursive

resolveDependencyGraph(dependencyGraph, reactant, withModifications=False)[source]

Given a full species composition table and a reactant it will return an unrolled list of the molecule types (elements with no dependencies that define this reactant). The classification to the original candidates is lost since elements are fully unrolled. For getting dependencies keeping candidate consistency use consolidateDependencyGraph instead

Args:: withModifications (bool): returns a list of the 1:1 transformation relationships found in the path to this graph

>>> dummy = SCTSolver(None)
>>> dependencyGraph = {'EGF_EGFR_2':[['EGF_EGFR','EGF_EGFR']],'EGF_EGFR':[['EGF','EGFR']],'EGFR':[],'EGF':[],        'EGFR_P':[['EGFR']],'EGF_EGFR_2_P':[['EGF_EGFR_2']]}
>>> dependencyGraph2 = {'A':[],'B':[],'C':[],'A_B':[['A','B']],'B_C':[['B','C']],'A_B_C':[['A_B','C'],['B_C','A']]}
>>> dummy.resolveDependencyGraph(dependencyGraph, 'EGFR')
[['EGFR']]
>>> dummy.resolveDependencyGraph(dependencyGraph, 'EGF_EGFR')
[['EGF'], ['EGFR']]
>>> sorted(dummy.resolveDependencyGraph(dependencyGraph, 'EGF_EGFR_2_P'))
[['EGF'], ['EGF'], ['EGFR'], ['EGFR']]

>>> sorted(dummy.resolveDependencyGraph(dependencyGraph, 'EGF_EGFR_2_P', withModifications=True))
[('EGF_EGFR_2', 'EGF_EGFR_2_P')]
>>> sorted(dummy.resolveDependencyGraph(dependencyGraph2,'A_B_C'))
[['A'], ['A'], ['B'], ['B'], ['C'], ['C']]

resolveDependencyGraphHelper(gkey, reactant, memory, withModifications=False)[source]

Helper function for resolveDependencyGraph that adds a memory field to resolveDependencyGraphHelper to avoid cyclical definitions problems >>> dummy = SCTSolver(None) >>> dependencyGraph = {‘EGF_EGFR_2’:[[‘EGF_EGFR’,’EGF_EGFR’]],’EGF_EGFR’:[[‘EGF’,’EGFR’]],’EGFR’:[],’EGF’:[], ‘EGFR_P’:[[‘EGFR’]],’EGF_EGFR_2_P’:[[‘EGF_EGFR_2’]]} >>> dependencyGraph2 = {‘A’:[],’B’:[],’C’:[],’A_B’:[[‘A’,’B’]],’B_C’:[[‘B’,’C’]],’A_B_C’:[[‘A_B’,’C’],[‘B_C’,’A’]]} >>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph, ‘EGF_EGFR_2_P’,[])) [[‘EGF’], [‘EGF’], [‘EGFR’], [‘EGFR’]]

>>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph, 'EGF_EGFR_2_P', [], withModifications=True))
[('EGF_EGFR_2', 'EGF_EGFR_2_P')]

>>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph2, 'A_B_C', []))
[['A'], ['A'], ['B'], ['B'], ['C'], ['C']]

>>> dependencyGraph3 = {'C1': [['C2']],'C2':[['C3']],'C3':[['C1']]}
>>> resolveDependencyGraphHelper(dummy.dependencyGraph3, 'C3', [], withModifications=True)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
CycleError

unMemoizedResolveDependencyGraphHelper(dependencyGraph, reactant, memory, withModifications=False)[source]

Helper function for resolveDependencyGraph that adds a memory field to resolveDependencyGraphHelper to avoid cyclical definitions problems >>> dummy = SCTSolver(None) >>> dependencyGraph = {‘EGF_EGFR_2’:[[‘EGF_EGFR’,’EGF_EGFR’]],’EGF_EGFR’:[[‘EGF’,’EGFR’]],’EGFR’:[],’EGF’:[], ‘EGFR_P’:[[‘EGFR’]],’EGF_EGFR_2_P’:[[‘EGF_EGFR_2’]]} >>> dependencyGraph2 = {‘A’:[],’B’:[],’C’:[],’A_B’:[[‘A’,’B’]],’B_C’:[[‘B’,’C’]],’A_B_C’:[[‘A_B’,’C’],[‘B_C’,’A’]]} >>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph, ‘EGF_EGFR_2_P’,[])) [[‘EGF’], [‘EGF’], [‘EGFR’], [‘EGFR’]]

>>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph, 'EGF_EGFR_2_P', [], withModifications=True))
[('EGF_EGFR_2', 'EGF_EGFR_2_P')]

>>> sorted(dummy.resolveDependencyGraphHelper(dependencyGraph2, 'A_B_C', []))
[['A'], ['A'], ['B'], ['B'], ['C'], ['C']]

>>> dependencyGraph3 = {'C1': [['C2']],'C2':[['C3']],'C3':[['C1']]}
>>> resolveDependencyGraphHelper(dummy.dependencyGraph3, 'C3', [], withModifications=True)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
CycleError

weightDependencyGraph(dependencyGraph)[source]: Given a dependency Graph it will return a list indicating the weights of its elements a path is calculated >>> dummy = SCTSolver(None) >>> dummy.weightDependencyGraph({‘EGF_EGFR_2’:[[‘EGF_EGFR’,’EGF_EGFR’]],’EGF_EGFR’:[[‘EGF’,’EGFR’]],’EGFR’:[],’EGF’:[], ‘EGFR_P’:[[‘EGFR’]],’EGF_EGFR_2_P’:[[‘EGF_EGFR_2’]]}) [[‘EGF’, 2], [‘EGFR’, 2], [‘EGFR_P’, 4], [‘EGF_EGFR’, 5], [‘EGF_EGFR_2’, 9], [‘EGF_EGFR_2_P’, 10]] >>> dependencyGraph2 = {‘A’:[],’B’:[],’C’:[],’A_B’:[[‘A’,’B’]],’B_C’:[[‘B’,’C’]],’A_B_C’:[[‘A_B’,’C’],[‘B_C’,’A’]]} >>> dummy.weightDependencyGraph(dependencyGraph2) [[‘A’, 2], [‘C’, 2], [‘B’, 2], [‘B_C’, 5], [‘A_B’, 5], [‘A_B_C’, 13]]

bionetgen.atomizer.atomizer package

Submodules

bionetgen.atomizer.atomizer.analyzeRDF module

bionetgen.atomizer.atomizer.analyzeSBML module

bionetgen.atomizer.atomizer.atomizationAux module

bionetgen.atomizer.atomizer.atomizerUtils module

bionetgen.atomizer.atomizer.detectOntology module

bionetgen.atomizer.atomizer.moleculeCreation module

bionetgen.atomizer.atomizer.resolveSCT module

Module contents