B2DSC - Help

General usage 服务大致用法 Design of the plot

Address of this service: http://mcgb.uestc.edu.cn/b2dsc

1. Enter sequence(s), select blast database(s), set parameters and finally submit.

(1) On the Home page, enter your DNA sequence in the textarea.

The sequence should be in FASTA format, and no less than 10 bp. We take probe sequence pSc119.2_1 as an example here.

(2) Set blast databases to align against.

In this example, we select wheat and all the 21 chromosomes to blast against.

For the other parameters (Additional Parameters), here we use the default ones. Set perc_identity and qcov_hsp_perc to 0 if you do not want to discard any HSP (High-scoring Segment Pair).

(3) The Parameters for similar sequences module.

The query sequence may be highly similar to known probes stored in this server.
There may be sequences similar to each other in the query sequences.
You may want to filter some kind of sequence out from your query sequences.

Ralated parameters should be set here.

By default, the server will first run bl2seq for the query sequences, then run bl2seq between query sequences and known probes, based on those two parameters shown in the image above. Sequences highly similar to each other will share the same blast result.

If you want to align each sequence to chromosomes directly, please check disable. Here we check this checkbox, because results of pSc119.2_1 was store in the server and by default the results will be used directly, we check this to run it again.

If you have already analysed some sequence, and want to filter simlilar sequences of it out from your query sequences, please check Align to sequences you want to filter, then enter your sequence into the textarea. Sequence should be in FASTA format, no less than 10 bp.

We do not filter any sequence out in this example.

(4) Submit the job by click 'BLASTN'.

Then the page will jump to the Job status page.

The page will refresh automatically until the task is done.

You can view the job later by 'job id' (red bold), it is t5930ccb2 in this example. Visit Job status, enter job id to query your job, and do more.

The genome of wheat is quite big, and this server is on a simple PC, so if the sequence is highly repeated, blastn may take a long time, several minitues for example.

2. Filter blast results.

It will like this when blastn is done.

The top half of this image showns parameters you can set. You can select one sequence a time or filter them all, pident and qcovhsp can not out of range for each of them (or All of them).

The lower half of the image shows best hit HSP for the query sequence, this could be a reference for us to set parameters.

If you want to see the blast results, click Download.

We use default values here, and click Filter blast results button to submit.

3. Draw picture.

Filtering may take several seconds, when finished, the page will like this:

Parameters for plot:

Size: margins of the whole image, blank width on left/right side of the ruler, width of each chromosome, and so on.
Offset: offsets of tick labels (e.g. set a negative integer for x to move left), chromosome names, and so on.
Color: colors to fill each chromosome, border of each chromosome. Color for lower limit, upper limit, and gradient steps.
Plot bars of reference sequence: plot distribution of the reference sequence, it is CCS1 for wheat. By default, in a kind of grey color, no gradients, the color is customizable.

Here we check Plot bars of reference sequence, and use default parameters.

Click Plot to plot.

Iamge is plot by JavaScript program, so it will cost some of user's PC resource.
Beacause the image is in SVG format, so please use a modern web browser, such as Chrome/Firefox or IE11+/Edge, other modern browser may works too.

Finally, we can see the result picture

Only those chromosomes hit with HSP meet the criteria were plot. Bar plot for CCS1 in the left of each chromosome shows the approximate position of centromere. The more red the line in each chromosome is, the more repeats in 1 Mbp it has, and bar is longer.

We can see that pSc119.2_1 distributed in B genome is much more than that in A and D genomes.

In the real plot result (not the image in this help), move your mouse over a line in one chromosome or a bar on both sides, and you can get some information of repeats in the range. For example:

We can see it shows on 5B chromosome, in the range of 5 Mbp ~ 6Mbp, pSc119.2_1 repeated 1,167 times. Note, position starts from 0.

The image is draggable (left mouse button) and zoomable (mouse wheel), click Reset button to restore.

Click the Save SVG button to save SVG format image. SVG format images can be edit by Inkscape/Illustrator, text editor and so on.

Click red lines in the image above to see more about the sequence distribution.

In the image above, A-I: the distribution of the 'N' (unknown or unspecified) nucleotides, mouseover it will show the start, end position and length of the region (B of the above image).
Click the red/green rectangles (A-II, III in the above image), the sequences of the HSPs in that region will be shown (C, D in the above image) if the number of HSPs is no more than 100.

4. Supplement tables.

There may several tables generated. For example, here shows a table contain the parameters used to run blastn. Besides, it may show mapping tables of similar sequences.

If the results is not ideal, you can reset parameters each step and run again.

本服务地址：http://mcgb.uestc.edu.cn/b2dsc

一、输入序列，选择数据库，设置参数后提交任务。

1. 在 Home 页面，如下所示的框内输入要查询的 DNA 序列。

注意序列必须为 FASTA 格式，序列长度要大于等于 10 bp。这里就以示例序列 pSc119.2_1 为例。

2. 选择要 blast 的数据库。

图中物种选择了小麦，勾选了所有染色体，那么任务提交后会将序列和这 21 条染色体序列分别进行比对。至于其他参数（Additional Parameters），可以根据需要设定，这里直接用默认的。
perc_identity 和 qcov_hsp_perc 越大，对匹配序列和查询序列的相似度要求越高。evalue（要大于 0）越小，对相似度的要求越高。这些参数都是 blastn 程序的，点击图中链接会访问 NCBI 的相关用户手册。

3. Parameters for similar sequences 板块。

序列可能会和服务器上存储的某序列相同或相似，序列之间可能有相似的，另外你可能想在查询序列中排除掉某序列。相关参数在此设定。

默认情况下，若有多条输入序列，将会首先进行内部比对，基于 Filter or Share 框内的两个参数，相似的序列将取其一后面比对到染色体。然后和服务器上存储的已知探针序列进行比对，如果高度相似，则直接采用这些已知探针的结果。剩下的序列逐个和染色体序列进行比对。

如果不想进行输入序列内部比对和比对到已知探针序列，请勾选 disable 复选框。在这里我们为了示范流程，勾选 disable，不勾选则会直接调用服务器上已存储的，那么就节省了时间。

如果要过滤掉某条序列，比如你已经分析过它了，不想浪费时间。请点击下图所示 Align to sequences you want to filter.，在展开出来的文本域内输入你的序列，同样是 FASTA 格式，序列长度大于等于 10 bp。在本示例中我们不过滤。

4. 提交任务。

设定妥当，点击 BLASTN 按钮，提交任务。

点击后将跳转至 Job status 页面。

页面会自动刷新，直至任务完成。图中给出了一个 job id，也就是 t5930ccb2，你可以记录下它，稍后访问 Job status，根据这个 Id 来查询并操作。

小麦基因组很大，而我们的服务器只是普通 PC 搭建，所以如果序列高度重复，blastn 就可能得跑比较久。之所以限定最多允许一次检索 20 条序列，也是因为服务器伤不起。

二、对 blast 结果进行过滤。

Blastn 结束后，页面会如图所示。

图中的上半部分给出了供设定的参数，分别是针对的序列、一致度和覆盖度百分比。下半部分则给出了各条染色体上的最佳匹配（只列出第一条记录），可以给我们设定过滤参数提供参考。

假如你想看看 blast 结果到底是啥样的，可以点击图中的 Download 下载。

在此我们就用默认值，什么都不改，点击 Filter blast results 按钮进行过滤。

三、画图。

过滤一般不花多少时间。完成后会如下图所示。

可供自定义的绘图参数分为四块。

Size: 图片边界留白，刻度尺左右留白，染色体宽度，字体大小等。
Offset: 刻度标签的偏移量（比如往左就将 x 设为负整数），刻度尺名字“Mbp”向下偏移量，染色体名的偏移量等。
Color: 染色体填充色、外框色，count（每 Mbp 内序列重复出现次数）下限值对应颜色，上限值对应颜色，默认用梯度颜色体现数目差异。
Plot bars of reference sequence：画上参考序列，对于小麦来说，就是着丝粒特异重复 CCS1 的分布图，默认用一种灰色，无梯度变化。可以自定义颜色。

在此，我们勾选 Plot bars of reference sequence，其他参数默认。

点击 Plot 画图。

画图用的是 JavaScript 程序，所以会依赖用户电脑资源。
由于图片以 SVG 格式展示，所以请使用现代浏览器，建议用 Chrome/Firefox 或 IE11+/Edge，其他现代浏览器应该也可以。

得到的图像如下图。

只有那些有比对结果的染色体才会被画出来。染色体左侧的 CCS1 bar 指出了着丝粒的大致位置，染色体上、染色体右侧越红、柱子越长，表明此处重复次数越多。可以观察到 pSc119.2_1 在 B 组染色体上分布明显比另外两个染色体组要多。

在实际画出的图上（上面贴的图不行），鼠标移动到染色体的横线或染色体两侧的 bar 上时，会显示相关信息，比如：

可以看到在 5B 染色体的 5 Mbp ~ 6Mbp 区间内，pSc119.2_1 出现了 1,167 次。注意 position 从 0 起始。

如上图，可以用鼠标对图片进行拖动（左键）和缩放（滚轮），点击 Reset 按钮将恢复。

本服务中所绘制的 SVG 图形，均可以点击相应的 Save SVG 按钮，可以保存 SVG 格式图片，此格式图片可以用 Inkscape 或 Adobe Illustrator 等软件进行编辑。

你可以改变参数来画出更好的图。

点击上图中的红色横线，可以进一步查看序列分布情况。

上图 A-I: 基因组参考序列中“N” (unknown or unspecified nucleotides) 的分布，鼠标悬浮在上面会显示起止坐标以及长度（上图 B）。
点击红色（绿色）矩形（上图 A-II, III），如果该范围 HSP 数目在 100 以内，将显示 HSP 对应的序列（上图 C, D）。

四、附表。

可能会产生一些附表，比如这里给出了执行 blastn 的相关参数。另外可能给出相似序列的对应关系表。