Base Sequence Viewer

@
PROBLEM
Visualize sequencing information at a nucleotide level so scientists can answer the questions, “Did I create the plasmid I designed? If I didn’t…what happened? Can I still use it?”
SOLUTION
Created a cohesive, intuitive, and scalable base viewer used across multiple products in the Plasmidsaurus platform. Decreased time-to-answer and improved platform stickiness.
ROLE
Solo Designer & Researcher
Owned end-to-end design; partnered with frontend engineer as product team.
RESULTS
77% ↑
increase in Amplicon orders after the introduction of Genotyping Analysis, a base-viewer focused product
Background
When scientists send off a plasmid for sequencing, they're answering a high-stakes question: Did I build the plasmid I intended to build? And if not, can I still use it?

When I came onto this project, our platform returned sequencing results as a comparison against a user-uploaded reference. It would mark samples as “match” or “no match”, but gave scientists almost nothing to work with when the answer was something messier than yes or no.
The before view: A sparkline version of each plasmid map was shown for each sample, as well as if it matched any uploaded references. Users had to download sequencing files and open with a different program to see where and what mismatches were in their sample.
The number one ask: Can I see what's actually happening at the base sequence level?
User research made the gap clear. Our sequencing technology was allowing for an unprecedented full view of the entire plasmid, but we were not showing the crucial details users wanted to see.
Historically, the technology for sequencing plasmids used short reads so scientists would only sequence the region they engineered, which mean they often missed errors and mismatches elsewhere on the plasmid. These errors could propagate forward, surfacing as unexplained failures later on where debugging and redoing the work cost far more in time and resources than catching the error earlier would have.

We had the technology to guarantee that a plasmid wasn’t hiding any errors that could threaten their experiment…but our visualizations were falling short of giving users the information they actually wanted. When our platform couldn’t give the answers they wanted, users would download the raw data and move to a different platform to look at the sequence base-by-base.

My goal was to shift how users thought about Plasmidsaurus: from 'the lab step in my workflow' to 'the place where I get answers and confidence to move forward'. The result was a base viewer that focused users on the most salient parts of their base level sequence, and eventually expanded to be used across multiple products in Plasmidsaurus’s platform.
Designing the base viewer
The existing solution on other platforms was a full-featured sequence viewer. Benchling and SnapGene both have large, scrollable, interfaces built for primer design and sequence exploration. They're powerful tools for users trying to design, edit, and explore plasmids in their entirety.  
APE, a common free DNA visualizer
Benchling, an enterprise-level electronic lab notebook that includes a sequence map
An initial sketch of how bases could be shown, along with some sequencing quality information
However, my insight was that our users don't need a sequence explorer. They needed to make a decision about each sample, based on what evidence they had. That distinction drove my design decisions. Rather than build a viewer that puts sequence exploration front and center, I went with a "mini" viewer: a focused, contextual display that surfaces the delta between what they actually had in the tube and what they were intending to build, aka their reference. 
Typically the first question is, "What's different from my reference?" Any mismatches are flagged prominently, and correspondingly highlighted in the base viewer. Users can click through each one to see what mutation occurred. Due to our sequencing technology, some called mutations were likely sequencing artifacts: we marked those separately (in blue) from real mismatches (red) and provided user education on what this meant.
While the focus was on the mismatch areas, it was still important to let people see other parts of the sequence to contextualize the mismatch. A scrollable and scrubbable text string lets users investigate upstream and downstream, as well as copy + paste their updated sequence into their design software.

I also added an amino acid track to show what translational effect a mutation would have. This made it very obvious if a mutation was silent and would have no downstream effect, or if it caused an early stop codon and would have a large effect.
I tucked the sequence quality information as a track above the bases, and showed confidence scores for each base call at each position. It's a lot of information that most scientists don't look for until digging deeper, so I chose to show it only on hover in the visualizer.
The focus on giving users just enough information to make decisions helped me refine the feature set and ensured we weren’t just rebuilding existing tools to be prettier, but giving our users an optimized results experience. The mini viewer was also built with expansion in mind. Plasmidsaurus's motto is to “Sequence Everything,” and base-level results aren't unique to just the plasmid product. As part of my design, I made a bet that we would need to keep reusing this base viewer with product-specific nuances layered on top. Therefore I made a deliberate design decision for the visualizer and its interactions to be agnostic to sequence length, sequence type, and different match conditions. 
Expanding the base viewer
The base viewer became a foundational part of the results experience. Users expected to see their sequence at the base level and PMs started speccing new products around that expectation. My bet that the base viewer would become a platform-wide component was validated, especially when our new Genotyping Analysis product was built with the base viewer as a central piece of product differentiation.

In our Genotyping Analysis product, we run sequencing results through an algorithm that assigns mutations to the right allele. 
Sanger (an older technology) isn't able to assign mutations to specific alleles. Not only can Plasmidsaurus's technology do that, but we would also visualize it for users!
Tools like Snapgene offer a MSA visualizer, but using it requres entering a separate dedicated flow.
Users wanted to see what distinct alleles made up each sample, and what mutations were seen on each allele. This meant we needed to show each of these alleles stacked up to a reference sequence. Multiple sequence alignment (which can be used outside of an alleles-in-a-sample context) scales exponentially with the number of sequences being compared, causing it to get computationally expensive very quickly. As a result, most other software that have this feature require it be used in a dedicated view.  

However, our users had the same end goal: they wanted to make a decision about what samples to move forward with, and moving to a separate interface introduced unnecessary friction. I was able to partner with our engineers who developed a smarter algorithm that handled the combinatorial alignment in a more efficient manner. As a result, I could expand the base viewer to directly show the stacked alleles. I also showed each allele in the full-length preview (above the base viewer) as a track rather than an annotated plasmid map to highlight where the mismatches were: 

In this much more data-dense version, I introduced some new interactions that cut down on complexity:
Reflections
What I'm most proud of in this base viewer is the vision to build a highly extensible component that would be used across multiple products. As a sequencing company, there's clear utility for something that shows the base level sequences of any results, no matter what users are sequencing or what they're hoping to answer. Using the base viewer as the fundamental building block that product-specific visualizations sit on top of means it's very easy for each product to share a design language, standard interactions, and visual logic.

I'm confident that we'll continue to use this in the future. I'm already thinking about how to show areas of high variability (such as in CRISPR-cas system knockout regions) or how to let users pick their most effective guides.

If I could redo this project:
  • Instrumentation would come earlier in the process. Stakeholders prioritized shipping quickly to get real user feedback. I was able to get this into the hands of users very quickly, but as a result the analytics were retroactively added. Upfront alignment on what we were measuring would have sharpened our product story earlier and reduced some mid-iteration churn.
  • Define the decision-making filter with stakeholders upfront. Establishing the question of "Does this help someone make a decision in their workflow?" as an explicit filter for what should be built was critical to manage numerous feature suggestions. If I brought stakeholders into this framework earlier, it would have have reduced some early scope creep and equipped them with a way to evaluate their feature ideas.