CODE CHALLENGE: Solve the Frequent Words Problem.
Input: A string Text and an integer k.
Output: All most frequent k-mers in Text.
##комменты-комменты: в даном челенже надо было найти наиболее часто повторяющиеся последовательности с заданной длиной в заданном тексте
I did it!!!))
##йф-цию PatternCount взяла из предыдущего челенджа. а йф-цию remove_duplicates, которая удаляяет повторяющиеся элементы в строке - нагуглила. отдельно не тестила. вроде работает. Строки print "patt " + Pattern и print Count[i] мне нужны были, чтоб оддебажить код, их можно(и даже нужно) закомментить при расчетах, а то будет портянка значений при выводе. Ну а дальше сам код. Вот:
def PatternCount(Text, Pattern):
counter = 0
for i in range(len(Text)):
if Text[i: i+ len(Pattern)] == Pattern:
counter += 1
return counter
def remove_duplicates(values):
output = []
seen = set()
for value in values:
if value not in seen:
output.append(value)
seen.add(value)
return output
def FrequentWords(Text, k):
FrequentPatterns = []
Count = []
for i in range(len(Text)-k+1):
Pattern = Text[i: i + k]
print "patt " + Pattern
Count.append(PatternCount(Text, Pattern))
print Count[i]
maxCount = max(Count)
for i in range(len(Count)):
if Count[i] == maxCount:
FrequentPatterns.append(Text[i: i+k])
FrequentPatterns = remove_duplicates(FrequentPatterns)
return FrequentPatterns
print FrequentWords("ACGTTGCATGTCGCATGATGCATGAGAGCT", 4)
patt ACGT
1
patt CGTT
1
patt GTTG
1
patt TTGC
1
patt TGCA
2
patt GCAT
3
patt CATG
3
patt ATGT
1
patt TGTC
1
patt GTCG
1
patt TCGC
1
patt CGCA
1
patt GCAT
3
patt CATG
3
patt ATGA
2
patt TGAT
1
patt GATG
1
patt ATGC
1
patt TGCA
2
patt GCAT
3
patt CATG
3
patt ATGA
2
patt TGAG
1
patt GAGA
1
patt AGAG
1
patt GAGC
1
patt AGCT
1
['GCAT', 'CATG']
for input:
GCAACATCGCTCTACCTGGATAGCGCATCACCAGAAATCACCAGAAGATAGCGCGAAAGAAAATGAAAGAAAATGCTCTACCTGGATAGCGCGCTCTACCTGGATAGCGCATCACCAGAAGAAAGAAAATGAAAGAAAATGCTCTACCTGGCTCTACCTGGAAAGAAAATGATAGCGCATCACCAGAAGATAGCGCGAAAGAAAATATCACCAGAAGCAACATCGCTCTACCTGGCTCTACCTGGATAGCGCGATAGCGCATCACCAGAAGAAAGAAAATGATAGCGCGCAACATCGAAAGAAAATGCAACATCGAAAGAAAATGAAAGAAAATATCACCAGAAGAAAGAAAATGCTCTACCTGGCAACATCGCAACATCGAAAGAAAATGATAGCGCGCAACATCATCACCAGAAGAAAGAAAATGCAACATCATCACCAGAAGATAGCGCGCTCTACCTGATCACCAGAAGCAACATCGCAACATCGAAAGAAAATATCACCAGAAGATAGCGCGCTCTACCTGGCTCTACCTGGCAACATCGCTCTACCTGGATAGCGCATCACCAGAAGATAGCGCGCTCTACCTGGATAGCGCGAAAGAAAATATCACCAGAAGAAAGAAAATATCACCAGAAGCTCTACCTGGAAAGAAAATGCAACATCGCAACATCATCACCAGAAATCACCAGAAGCTCTACCTGGATAGCGCGCAACATCATCACCAGAAGAAAGAAAATGCAACATCGATAGCGCGAAAGAAAATGCTCTACCTGGAAAGAAAATGCTCTACCTGGATAGCGCGAAAGAAAATGATAGCGCGCAACATCGCAACATCGAAAGAAAATGATAGCGCGAAAGAAAATGAAAGAAAATGCTCTACCTGGATAGCGCGCTCTACCTGGCAACATCATCACCAGAAGCAACATCGAAAGAAAATGCAACATCGATAGCGCGCTCTACCTG
14
my output:
['GCTCTACCTGGATA', 'CTCTACCTGGATAG', 'TCTACCTGGATAGC', 'CTACCTGGATAGCG', 'TACCTGGATAGCGC']
AND ITS CORRECT YEAH
ah ITS CORRECT YEAH it onlygives GCTCTACCTGGATA as an output
ВідповістиВидалити