середа, 29 жовтня 2014 р.

продолжение CODE CHALLENGE: Solve the Clump Finding Problem 

Сегодня я посчитала то, что требовалось. Переписала алгоритм.. уменьшив число итераций.. сначала найдя последовательности заданной длины встречающиеся с заданной частотой вообще во всем коде, а потом в Л-мерах ища только их. НО! И ДАЖЕ ТАК))) оно мне считало 1360 сек.. 22 мин.. ))) Лол. Неэффективная эффективность моего алгоритма..  В форуме пишут почитать The Frequency Array, но я не разобралась пока что. Ибо англ.. и математика-массивы-индексы-частоты..

def often(Genome, k, t):
    res_often = []
    seen = set()
    for n in range(len(Genome)-k):
                count_t = 0
                k_mer = Genome[n:n+k]
                print (str(n) + "kmer")
                for j in range(len(Genome)-k):
                    #print str(j) + " step"
                    if k_mer == Genome[j:j+k]:
                        count_t +=1
                #print ("t= " + str(count_t))
                if count_t >= t:
                        if k_mer not in seen:
                              seen.add(k_mer)
                              res_often.append(k_mer)

    return res_often

def Clump(Genome, k, L, t):
    k_often = often(Genome, k, t)
    print (k_often)
    res_gen = []
    seen = set()
    for i in range(0, len(Genome) - L, k):
            print (str(i) + " Lmer")
            L_mer = Genome[i:i+L]
            for n in k_often:
                count_t = 0
                k_mer = n
                #print str(n) + "kmer"
                for j in range(0, L-k):
                    #print str(j) + " step"
                    if k_mer == L_mer[j:j+k]:
                        count_t +=1
                #print ("t= " + str(count_t))
                if count_t >= t:
                     if k_mer not in seen:
                        seen.add(k_mer)
                        res_gen.append(k_mer)
   
    return " ".join(res_gen)

#print (Clump("CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA", 5, 50, 4))
print (Clump("GGGAACTAGGCAAGTGCCGAACCAACATTTCTAGGTTTGATGGCGCTTCCAGACCGTTACACGCTGCCGACTCTGTACAAAGATCAGGCCCTCCTGTGATGCCTAATTGCGCCTAATTTCGCCCTTACCACAGCGTCACGCCTAATTTCGCCTTGATTCATGAGAGGGCCTGCCTAATTTCGCTCGCGGTAGGTGTTAGTGCCTAATTTCGCAAAGTGTCGATCGCACGACCGACTTAAAACCACATAGCTCCACGGTTAAGCCTAATTTCGCTATTAAGCGGAAGAGCCTAATTTCGCAATGCCTAATTTCGCAATCTGAGGCACACAAAGTACTTAGGCCTAATTTCGCCCCTCGACCACATCACCCTACTACACATCACCCTATTCGCCTTCGTGGGGTACACATCACCCTAGCACGGTTCTGGCCTAATTTCGGCCCACATCACCCTAACAAACAGAATACCACATCACCCTACTACCTAATTTCGCTTTCGCCAACGGCATCGCCACATCACCCTACGATCAGAGCCACATCACCCTACACCCTAGCCGTGGTCGCAACGTGGTCGCCACACATCACCCTAAACGTGGTCGCACATCACCCTAACACATCACCCTATAACGTGGTCGCCCGCCCCTTATGTTCCCGGGGCCTAATTCACATCACCCTAAGGAGCCTAAGCACACATCACCCTACTCGAACCACATCACCCTATCACACACATCACCCTACAGCACATCACCCTAGTGAACGTCACATCACCACATCACCCTACACATCACCCTAACAACGCCGACCACACATCACCCTATCACCCTAACCCTATAACGTGGTCCACATCACCCTAGAACGCACATCACCCTACACCCTATACGCCCACATCACCCTAGGCACATCACCCTAGCCTCAGCAACAGAACGTGGTCGCCGGCACATCACCCTACGTGGTCGCCCGATTAATAAAAGCAACGTGGTCGCCAACGTGGTCGCAACGTGGTCGCCCGGCCGATCACGTGTCACACAAAACGTGGTCGCCCAGAGAGCGATGCTGTATATTGAAACGTGGTCGCCCGTGGTCGCCTAACGTGGTCGCCGTGGTCGCCCACGTGGTCGCCGTAGGTTAACCATGCCGACTTATAGATTAGACATCCCAGAAAAGGAGGTAGACTTGTATCGTACCCATAACGTCTGCACCGAACATTGGCCCCTGTGTTGGGGTGCCTCTGTGGGCTACCGAGTGTCTCACCTATTATCATTGCTGGTGACACTCTCTGATCGATAAGTAAGAAGTCACTTGTCGTCACTTGTCTTAAGTCACTTGTCTAAGATAAGTCACTTGTCGGTTGCTCCATAGCCCTAAGTCACTTGTCTTAGTTGTAAGTCACTTGTCAAGTCACTTGTCTCAAAAAAGTCACTTGTCAGCGAGACGAGCATTGCGGGTAAGTAAGTCACTTGTCTGCCTCTTAAGTCACTTGTCCACTTGTCGCAGCTTGGCACTGAAAGTCACTTGTCACAAGGGGTCGTAAGTCACTTGTCCATAAGTCACTTGTCTCACTTGTCCTTGTCCTGTCTTAAGTAACCCCTCAGGTTCGCGCTGGCACAGGTACCCACTAAGAAGTCACTTGTCGTCGTAGCTCTTATAAGTCACTTGTCCACAGAGGTCACTAGCGCTCCGATAGGGGCGTTAAGGAGAAGTCACTTGTCCACAAGTCACTTGTCGTCGCATTATAAGTCACTTGTCCCTCACGGTGTCCTCCTTAAGTCACTTGTCCACTAAGTCACTTGTCGTCACTTGTCCAATAAGTCACTTGTCAGGATAATCAGGAAAGTCACAGGATGGTGCGAGAAGTCACTTGTCCAGGAAAGTCACTTGTCTTCCTAGATACGACGTATAACTTATAAGGTGCTGCCCAGGGCCGCGAACGCACTTAAGTCATCGTACACTACCCCAGTGTACCGATACTGCCGTAACGAGAGCAACCAATCTCTGCCATTGGACTCTTCGTTCACAACCCCGCAGGTTCCACCCCTATCCGGAGTCACTTCCTGCGTTAGATGGGGGACCTACTGTGTGCCATCATTCCAATGAGTCTACATGTCGCTCAACAGAGTCTCCTGGGGTTACCGCATCGCAATCCGTTAGGCCGTATACGTTCGTTAGGCCGTACGTTAGGCCGTAGGAGGACTTTAAAGGGACTTTAAAGTAGGCCGTAAGGACTTTAAAGCGTATAGTGCGGGCAGGACTTTAAAGCGTACGTTAAGGACTTAGGACTTTAAAGAGCAGGACTTTAAAGTAAAGCTACTAGGTGAAGGCCTAGAGGACTTTAAAGGAGGACTTTAAAGCGAGGACTTTAAAGACTTTAAAGAGAGGACTTTAAAGGTGTAGAGGACTTTAAAGGAAGGACTTTAAAGGGACTTTAAAGTGGAGAAGTGAAGGCCTGGAAGAGGACTTTAAAGGAAAGGCCTGGATACCGCTAGGACTTTAAAGCCAGGACTTTAAAGGGTGAAGGCCTGGATGATCAATCTTTTACGTCAAAGCATGAAGGAGGACTTTAAAGTAAAGTTTAGGACTTTAAAGATAGGACTTTAAAGAAAGTTCAGGACTTTAAAGTGGAGCATTTCAAAGTTCAGTGAAGGCCTGGAAAGTTCAGTGCGTTCAAAGTTCAAGGACTTTAAAGGAGAGGACTTTAAAGGACTTTAAAGGACTTTAAAAGGACTTTAAAGAAGGCCTGGACCTGGAAGGACAGGACTTTAAAGGACATGAAGGCCTGGAAACTGAAGGCCTGGTGAAGGCCTGGAGGCCTGGAATCGATTTCGGAAATTGAAGGCCTGGACTTTTAATGAAGGCCTGGATGATGAAGGCCTGGATGATCAATCTTATCAATCTTTTAATCCAAAGTTCACAAAGTTCAGTGAGTGGTCAAAGTTCAGTGAGTGACGCGAGGTCTCGGTAAAAACCATACTGGCACCCAAAGTTCAGTGGATCGTCGTTGTCATTATTTCGTTGTCATTATCATGAAGGTAAGAGTGCGCCAACAGGCGCTGTGGCCAAAGTTCAGTGATGCAAGAAACGTTGTCATTATTATGCGTTGCGTTGTCATTATGTCGTTGTCATTATCGCGTTGTACGCTCTCCCGCCTCCCGCCTCGTTGCGTTGTCATTATGGGACCGTTGTCATTATGTTGTCATACGCTCTCCCGCCACGTTGTCATTACGCTCTCCCGCTTGTCATTATCACGCTCTCCCGCCCGCCTACGCTCTCCCGCGCCGACGCTCTCCACGCTCTCCCGACGCTCTCCCGCTATACGCTCACGCTCTCCCGCCGCTCTTATAAGAACACGCACACGCTCTCCCGCTCCCGACGCTCTCCCGCCCAGGAGGAACGCTCTCCCGCACCGCGTCGCAACGCTCTCCACACGCTCTCCCGCTACGTTGTCATTATCGTTGTCATTATGTACCTGCGTTGTCATTATTATGCGTTGTCATTATATTATATGCCCGTGCAATACGATGCGAAGTCGCACATCGGAACACGTTCGTAGTGTACAGTCTACGCTCTCCCGCCCCGCACTGGACGCTCTCCCGCTTACGCTCTCCCGCCAGCCTAATAACGCTCTACGCTCTCCCGACGCTCTCCCGCGTTTGGCAATATCGTGAAGACACGCTCTCCCGCCGGAAACGCTCTCCCGCACGCTCTCCCGCAGGAAACCTTATCGGAAGAACTAGCTTGATGACAAGGCAACGACGCAGGACCCGACATTTGATGGAAACCGGTGTAACATCCCGGCCACCCGGGTTTCTAAGGGTCCGAGCGGTACGTCGCTCGAACGCGTGCTCTGCAAAGTGCCTGTAACGGACAACCAAGTATGCACACGCTACAAGCCAAACTCTCGCGTGCTAAGACCACGCGGCGTGCGTGCTAAGACGCGTGCTAAGCGTGCTAAGACTGGCGGACGGCGTAAAGGGCGCGTGCTAAGACACCCCGGCGTAAAGGCTGTCTATCGGCGTAAAGGCTCGACAGCGTGCTAAGACCGTGCTAAGCGTGCTAAGACTCGTTACGCGTGCTAAGACTGGCGTGCTAAGACATGTGCGTGCTAAGACACGGCGTAAAGGCAAAGCGGCGTAAAGGCCGGGGCAGTCGCGAAAAGCGTGCTAAGGCGTGCTAAGACAAGACCGTGCTAAGACCGGCGTAAAGGCTGGCGTGCTAAGACGCTAAAGGCCTTACGGCGTAAACGGCGTAAAGGCGCGTGCTAAGACAGACGCTATCCGGCGGCGTGCTAAGACGCGTGCTAAGACGCGTAAAGGCGGCGTAAAGGCGCAAAGGCGTGGCGTGCTAAGACATCAAAGTATCCAGCGTGCTAAGACTAAAGGCTCGTTGGATCCGCGTGGCGTGCTAAGGCGTGCTAAGACAGACGGCGTGCTAAGACGACGGCGTCGGCGAAGATAGTCAGTTCGGAAGATAGTCAGTGACGGATTCATGCCAAGATAGTCAGTCAGACTATGAGGACTTCTGTTAACAAGCAAAGTTTCACTTAAGATAGTCAGTACAACAAGATAGTCAGAAGATAGTAAGATAGTCAGTGTTACAAGATAGTCAGTATGGTTAAAAAAGATAGTCAGTAAAAAGATAGTCAGTAGGAAATTGAAGATAGTCAGTTATGTAGGAAGATAAAGATAGTCAGTCAGTGCTTTATTCAAGATAGTCAGAAGATAGTCAGTGTAGGTCAAGATAGTCAGTCATTGGTGGGAGTCAAAAAGATAGTCAGTAGAAGATAGTCAGTAGCGGTTCCAAAGATAGTCAGTATAGTCAGTTTCTGACAAAAGTCCATGACAAAGATAGTCAGTGCCGTTCTTAAGATAGTCAGTCCCCCGGAATCAAGATAGTCAGTGATAGTCAGTCGTTTTCCACCTGGTAAGATAGTCAGTGTGTGTAGAAAACTAAAAAGATAGTCAGTAGGCATAAAGATAGTCAGTTCCTGTGGGAAGATAGTCAGTGAGATCTCACCGAAAGATAGTCAGTAAAGATAGTCAGTTACAACACAGGGTCATACTCGATCTCCCGGTGAGAACTTCACATTTTTAGACTCACCCTTAGGCAAGACCCTTGTATTGAGGTTGGCTGGACTCATAGTGCAAGCGAGTCCGGGTACGCGGAGTTAGTCGTTTTAACGTCCCGGTCGCATTCCCAAGCGCGATATTTTAGACTGGACGTTCTTACTTGGGGGGGGACATGGGCGGGTGATTTGGCTTACCTCTGAGTGTAAATCTACTAGCGATACTCCTCAGCATAGGTAATTCCGCGCATAAACGATATGAATTAGCGATGTGGACAGATGTTAAATGATGGTCCATACTAGGGGCGAATGCAGCCCTGCCAGACGTCTCCCACCGCGTAGCACCTGTCCTGATCGCCCGGATGAGCAGTGATTATCCATGCAGTACAGAACCCAGTTGTTTTAAAATATGATTCAGCATCCAGTCCGAATTCTATAGACGACTTAAACTCGTGCATCGGTTAGGAGAACGACAGAACACGCCAGAAGCTCTCCTCCTACCGTTGGAGGCGGAAGGTAGGCAAAAAGGGCGAATTTGATTTCCAACCGCTTAATAACCAAGTAAGCACGCAAGCCGACAACCGCAGACGTAAGCATCGGTATGTGAGGTTGAGGGCGTTGGGGCGTTGGACAACATATTGAGGGCGTGAGGGCGTTGGTTGATGAGGGCGTTGGGGCGTTGGGTTGGGTTGAGGGCGTTGGGTAGCACCTTCGTGAGTCTTGGTTCACCTTCTACTTGAGATATGAGGGCGTTGGGCCCTTCCTAGAAGGTGAGGGCGTTAACTATTTAGTTATTTAGTTTAACTTGAACTATTTAGTTGTTGAATCTCGAGGGCAACAACTATTTAGTTGACAAGAGAATCTCCCAAACTATTTAGTTAAACTATTTAGTTGTTGAGGGCCAACAAGAGAATCAAAACTATTTAGTTGAATCTAAACTATTTAGTTGAGAATCTCCGTACTGCAAACTATTTAGTTAGTTAAACTATTTAGTTATCTCCCCTCGGCCTTCCCCAAGAGAATCTCGAGGAAACTATTTAGTTAGAATCTCGTGAGCAAGAACTATTTAGTTGGAACTATTTAGTTTTTAGTTTTAGTTTTAGTTTTAACTATTTAGTTGAGAAACTATTTAGTTTCAACTATTTAGTTAAACTATTTAGTTATTTAGTTAAAACTATTTAGTTCAAGAAACTATTTAGTTTCGAACTATTTAGTTTTCTCAGCCAACTAACTATTTAGTTAGTCACAAGAAACTATTTAGTTGTTTTGCTAGAGAACTATTTAGTTGTTAAACTATTTAGTTACAAGAGAACTATTTAGTTACTATTTAGTTACTATTTAGTTTCTGTCACCACTGTCGGCCTTAACAACGGTCCCCACAAGAGAATCTCGGCCATAAGCGGTATTAAGAGCCCACTGCGGTAACTGGCATGCATCACCAAACTTTTGGTTGATGATAGAAAAGGTCCTACTGCGATGCCCACCCCGGAAGCGTCACCAGCCCGCGTGCGCTCTATATTGCGAACTATGGGCAGGCCTCGGTAGGATCGATAGACAGGATGTCACCAGTAAGCACGGTGGCTATCTGCAACTTCCTCTTTAGCCCGATCCGAAAATAGACGGGCCCGTAGTCGCGACTTATAAATAACGAGTAAAACTCGCTAAACTACAAGACACGATCGCCCTCTTAAAGTGCAGTTGTTGTCCCCCTCATCTTCATTATCTACCGTTTGAACCCTCTACCTGGGGGCCTAGAGGTAGAAGCGAGGAACTTACCTCATCCATTACGCGCACTTCTATCTTTCGCGCCGCGTAGCGGCCGTTAATCCCTGTCCACCAATGCGGATAGGGTACGCACTTGAGAGCATACGGGAGTTAGTCGTAGGCGGTCCTACTATCGAAGTTAGCTGTACGTCACACCCGGAAAGCTCCAAACGCTCCTGGGCGTATAGTCCAGGTGTAAAGGAAAGTGTTGGGACGTTGCCGGAAGCCGGGGCCTCCACCTATGAACTACAAGTCATCCACCTTGGTCATTCTCCTTCCGAGCCTATAACTGTGTGCCGCGCCGGCAAGTCTCCAATGTTTATAGGAATCGTCCGGTTGCAAGCCACTAATGTGTTGTTGCCAAACGTCCTTTACCCGGTTGTCCTTGCTCGCAGCCCACGAAGCGCATCAGCGTCTTCAGAATGCAAACGTTACGGGCTGTGATTATGCAAAGTTTATGGCCTGAGCGGATACTTCACGCGCGCATCTCCAAAGCAGCACGGTGCGCTCGCAAACGCGGACACAAATGGTGGCCTCAACACGTAGGATGGCAGGAGAGGGTTTGTCCCTGAAGATTTTCTTCCACGGTTGGTACTTCTCCTTCTTACGCGGCTTAAGCGATTGGTTAGAAGCTCTGGAGCGAGGTGTTGCGACGATGCGCATGCGCCGGTCCTTTAGTCGATGCTCACGGTAAGAGGCGTGTTCAGTTGTCCCTCCGCGAACCCCGAAACATTGATGACCTCGTCGCTGTTATTTATATACCCCTGTGTGCGCTATCCACCCGCCACTGTCCGATGATCGCCTGAAGGGTCCGATGATCGCCGCGTGGCCCGGTTTTCAACGTGCCTCTCCGTCCGATCCGTCCGATGATCGCAATCCGATGATCGCGGCTTCTCGTCCGATGATCGCCTCCGATGATCGCTATCCTCCGATGATCGCCGCCAGCTCCTTCCGATGATCGCAGAAAATCCCGTTCCGATGATCGCCTTTTAACAATTTATCATTTGCTTGCAAAGGTTCTCCGATGATCGTCCGATGATCGCAGCTGGGGTCAAAATTCCAATGATCTCCGATGATCGCCCCGATGATCGCTCAACCAACTCTGAGCTTGGTAGTCCGATGATCGCCCAGATGCGTCCGATGATCGCTGGGACTTTTAAATGCAATTCCGATGATCGCACAGCCTCACGGCGATTGGTCGGTGCTTCTCGACTTCCGATGATCGCTCCGATGATCGCGGCTCTCCGATGATCGCTAATCCGATGATCGCCAGTCCGATGATCGCTTCCGATTCCGATGATCGCGATCGTAATAGAGTTACGATCCGATGATCGCACTCCGATGATCGCCGGGAAGGTTGCGATTTCCGATGATCGCCGCACTTGCATCCACTCAAGCAGGGTGTATGGATACTCTTTGTCTCCAGACATTGAGGACGTCTCTAACTACGCCTGGTCATTGGATGACTAATCCCCTATAAATGGGCTTCGGTCCTGTACTAGCGTAATCTTCTTGCTACTTTAATTTACCGCGGCCGGATTGGTTTCAAAGGATTGTATGCAACTGCGCCACATAAACCCAGGGTAGGGCCCTGTTAGTAAGGTTTTGATTACATCCCTTGAACTATGAAATAACCCGAGCGTAACTATTTGACTCCGAAGAGACGAGCCCTTATTTCGAACGAAAATATTGTAACGTTCCACACCTAATCGTGCCGGGTGGGTACAGACATGTCGCACCAACCCCTCTGGCTATGTTTTCTCGGTGGTTCTAACCCCACTCGCAGAAGTACGGATAATGTACAATCCTAGTTACCAAAGGGTATTTTGTCAGCATATGTCAGCATATCTAGCATATCTCCTCACAAAAGGTGTCAGTCAGCATATCTAAGTCAGCATATGTCAGCATATCTATATCTTCTGTAGTATGCGTCAGCATATCTCGAGCCCCAGATTTCCTCAGTCAGCATATCTAGTGTCAGCATGTCAGCATATCTAACAATAGGTCAGCATATCTTACAATCGTCAGCATATCTCAGTCAGCATATCTGGCAAGTGCGGATATATTTACAGGTCAGCAGTCAGCATATCTTCTAAATCAGCTAGCTCCGATGTTATTAAGTCAGCATATCTCGCGTCAGCATATCTTAGTCAGCATATCTTTCACCCGAGCAGAGTCCGACGAGAGGTGGTCAGCATATCTGCAGCCAGTCAGCATATCTGTCAGCATATCTTCTTCTCCAAAGTCAGCATATCTAGCATATCTTGTTTGTCAGCATATCTTGTTTGGATCCTCATGGCCAACGACTGTTCAGTCAGCATATCTGCTTACCTGGGAGGTCAGCATATCTGGATGGGTCAGCATGTCAGCATATCGTCAGCATATCTGAGTTGTCAGCATATCTGGTCAGCATATCTTGATATCGCAAACCCCTTTTCTGACATGACACTACACTCACCGTGATCTCGCGAGCCTAACATCAACCAGCCGCTCGCCTTAAGCAAGGTAAACATAGACTACGCAGATGCACACGACGGCAGCGTAAGTTGTACGCCCGGGAAGTGGAGTGCGTTCACACTTATTATGATAAGAGAGTATCATGTAGTACCGGTACTTACAAAGCGGGGATAAAGTAAGCTACAGACTCTTCGATAACTAGACTGCATGAACTATCAAACAGCCAACCTGTTACTATACTGCTCTAGGTTCAAATTGATCAGAGGTCGTGGAGGAGGGCGTTGGCGCCTCGGCGTAAGACGTCGTCGGTCGTTTGACGTGACGGGTCCGCAAAAAGTGATTGACGCTGTGTAGGATCATGGTCGAGCTATGGGGCGAAGAACTATCACCACTATCAAAAGCCCAGTCGGACCTGAGGGGGGAATAGATGCACTACTCGTGAAAGCCAACCTCAACACATTACCCATTACGCAGAACTACGATGGCAAACGAGGCGTCTGCTGGCAACCCACATGTGTTGGTTCCCATTTGTGCGACGATAGTCCATGTACAGCAAAATGAATCAAATAAGTGTGTTGCACCGACTCCAAGATGGTACTCTGCACCCAACGTTCTGTTCGGGATAGAGGAAGTACACTATGCGATCTAGAATGTTTGTCGCCTAGATTCTTGCGACTCGCCGTGAATGA", 12, 510, 20))

мой ответ(хз проверить не могу, проверялка рубит по таймауту):
AGGACTTTAAAG AAGATAGTCAGT AACTATTTAGTT TCCGATGATCGC GTCAGCATATCT

Немає коментарів:

Дописати коментар