visualise

craft.visualise.fit_text(ax, *args, **kwargs)[source]

Write text into ax by passing args and kwargs to ax.text, then change the view limits of ax if necessary to include the text.

Trickier than it sounds, and the results are a little approximate.

craft.visualise.ld_block(array, indexes=None, names=None, labels=None, text_kwargs={}, figsize=(8, 5), cmap='Reds', colorbar=True)[source]

Create and return an linkage-disequilibrium block chart. array is a square numpy array containing LD values (Pearson’s correlation coefficient r). Only the upper triangle of the array (above the diagonal) is used. The values actually plotted are r^2.

indexes is an iterable of index values into the rows and columns of array, giving the order and identity of the SNPs to display. If None, the whole array is displayed in array order.

names is an iterable of names, which should be the same length as indexes, or the number of rows in array, used to display row/column names above the LD block. If None, no names are displayed.

labels, if present, is a dictionary of labels with available keys “mid”, “left”, and “right”, for example dict(mid=’chr17, left=12398013, right=18290324).

If there are no labels then no label bar is drawn.

figsize is the size of the figure in inches (width, height).

cmap is the name of a Matplotlib colormap.

colorbar controls whether to add a colorbar. The default is True.

craft.visualise.locus(df, cred_snps=None, figsize=(5, 8), size=1, color='b', marker='.', threshold=0.8, alpha_line_width=0.5, alpha_line_color='0.5', alpha_line_style='--', good_size=2, good_color='g', good_marker='D', good_label_column='rsid', good_label_rotation='vertical', tracks=None, track_height=0.5, track_column='tracks', track_colors=['r', 'g', 'b', 'c', 'm', 'y', 'k'], track_alpha=0.3, track_linelength=0.7, track_good_linelength=1.0, track_lines=False, track_line_width=0.5, track_line_color='k', pos_top=False, genes=None, gene_height=0.5)[source]

Draw and return a “locus plot”, with values above some critical threshold marked distinctly and labelled.

df is a Pandas dataframe with two required columns:

“pp”: the posterior probability of each SNP; “position”: the position of the SNP (in base-pairs); “chromosome”: the chromosome name or number (should be the same for every SNP).

If automatic labelling of credible SNPs is required, (controlled by cred_snps or alpha parameter), the DataFrame must also have a column labelled by the good_label_column parameter.

If a tracks pane is required (controlled by tracks parameter), the DataFrame must also have a column labelled by the track_column parameter.

figsize is the figure size (width, height) in inches.

size is the point area for the main scatter plot, in square points.

color is the point color for the main scatter plot.

marker is the marker style for the main scatter plot.

cred_snps, if given, is a list of credible SNP IDs, or a list of lists of credible SNP IDs. If the latter, the SNPs in each list are drawn in a different color (see good_color).

threshold is a threshold for distinguishing credible SNPs. If cred_snps is given then threshold is only used for drawing the alpha line.

alpha_line_width is the width of the horizontal line to be drawn at the alpha level.

alpha_line_color is the color specifier for the alpha line.

alpha_line_style is the line style for the alpha line.

good_size is the point area for credible SNPs, in square points.

good_color is the color for credible SNPs. If cred_snps is given, and is a list of list of IDs, this should be a list of colors (and defaults to [‘r’,’g’,’b’,’c’,’m’,’y’,’k’]).

good_marker is the marker style for credible SNPs.

good_label_column is the column name in df or index_df for labels to be drawn for credible SNPs, or None for no labels.

good_label_rotation is the rotation for labels for “good” SNPs.

tracks is a list of track labels for the tracks pane, if required.

track_height is the height of the tracks pane, relative to the posterior probability pane.

track_column is the name of a column in the df DataFrame. A given SNP is shown on a track if its entry in this column contains the corresponding label from the tracks parameter.

track_colors is a list of Matplotlib color descriptors, to be used for the tracks.

track_alpha is the opacity of a track marker for a non-credible SNP (credible SNPs have alpha 1.0).

track_linelength is the length of a track marker for a non-credible SNP.

track_good_linelength is the length of a track marker for a credible SNP.

track_lines controls whether to draw vertical lines under the tracks pane, marking the credible SNPs.

track_line_width is the width of vertical lines under the tracks pane marking credible SNPs.

track_line_color is the color of vertical lines under the tracks pane marking credible SNPs.

pos_top controls whether the panes for tracks and genes are above (True) or below (False) the posterior probabilty pane.

genes is a list of genes for the genes pane, if required. The list member for each gene should be a tuple (start position, end position, name, strand), where strand is “+” or “-“.

gene_height is the height of the genes pane, relative to the posterior probability pane.

craft.visualise.manhattan(df, x_label, index_df=None, alpha=5e-08, figsize=(8, 5), size=1, color='b', marker='.', alpha_line_width=0.5, alpha_line_color='0.5', alpha_line_style='--', good_size=2, good_color='g', good_marker='D', good_label_column='rsid', good_label_rotation='vertical', vertical_lines=[])[source]

Draw and return a “Manhattan plot”, with values above some critical threshold marked distinctly and labelled.

df is a Pandas dataframe with two required columns:

“pvalue”: the P value of each SNP; “position”: the position of the SNP (in base-pairs);

If automatic labelling of the most significant SNPs is required, (controlled by index_df and alpha parameters), the DataFrame must also have a column labelled by the good_label_column parameter.

index_df, if present, is a Pandas dataframe like df, with the same required columns, identifying the SNPs to be labelled.

x_label is a label for the x axis (such as “Chromosome 17”).

alpha is a threshold for distinguishing “good” SNPs.

If index_df is present then the SNPs in it will be distinguished. If index_df is absent, and alpha is set, then SNPs with pvalue less than alpha are distinguished.

Distinguished SNPs are drawn differently (see the good_ parameters below). If index_df is present, or if both alpha and good_label_column are set

figsize is the figure size (width, height) in inches.

size is the point area for the main scatter plot, in square points.

color is the point color for the main scatter plot.

marker is the marker style for the main scatter plot.

alpha_line_width is the width of the horizontal line to be drawn at the alpha level.

alpha_line_color is the color specifier for the alpha line.

alpha_line_style is the line style for the alpha line.

good_size is the point area for “good” SNPs, in square points.

good_color is the color for “good” SNPs.

good_marker is the marker style for “good” SNPs.

good_label_column is the column name in df or index_df for labels to be drawn for “good” SNPs, or None for no labels.

good_label_rotation is the rotation for labels for “good” SNPs.