|
Converts the data from upstream and downstream files to a format that
can be used by learnrdr.
File retrieve-seq.2009_04_20.154301.res contains upstream sequences
and retrieve-seq.2009_04_20.154355.res downstream sequences.
A command line argument (N) specifies the size of data files to
create. Files upstreamN.txt, dowsnstreamN.txt and random N.txt are
created.
For each, N/2 sequences are randomly chosen from
retrieve-seq.2009_04_20.154301.res and N/2 from
retrieve-seq.2009_04_20.154355.res with sequences from
retrieve-seq.2009_04_20.154301.res labelled "1" in
upstreamN.txt and "-1" in downstreamN.txt and vice versa for
retrieve-seq.2009_04_20.154355.res. The sequences in randomN.txt are
labelled "1" or "-1" randomly.
|