Part 4: What's the solution? For color visualisations like heatmaps, contour maps, etc.
I think it is reasonable to say that a significant proportion of the plots that we generate both at Ansys and using the Ansys software tend to be in the form of heatmaps and 3D heatmaps as well as their related cousins, contour plots. Plots like this are extremely common in our field due to the high volume of data engineering simulation often puts out and heatmaps are a fantastic way of condensing that into a single plot.
They also have their own set of rules regarding visualisation that are entirely separate from the rules of the previous sections, although all those rules do also still apply. Heatmaps, use color by definition, however. In contrast the plots discussed in the previous part use color optionally and its primary use is for enhancing the aesthetic. As such, there are special rules for color-usage in heatmaps and any plots that use color to convey data on a continuous spectrum. These rules revolve around the three aspects of color: Hue, Saturation, and Lightness. Hue is the color choice "red" or "green" are hues. Saturation dictates how powerfully that Hue is present in a color, and Lightness is a measure of how "bright" the color is. However, I need to explain the concept of "Lightness" in a bit more depth (as well as what heatmaps are, in case anyone is lost) because it is particularly vital to understand to 'get' the big issues plaguing color choice in DataViz.
Aside: What are heatmaps?
Heatmaps are plots that depict at least 3 dimensions of data, where at least one of those is represented with color. They are known as "heatmaps" because they were originally used to show temperature I think. Maybe. It is not clear where the name came from so this reason will do as well as any! The CMBR plot from Part 1 is a good example of a heatmap. Below is a heatmap taken from Ansys Sherlock (this tutorial to be precise) which shows the temperature of a circuit board during use, in Celsius.
Generally heatmaps are just considered to be 2D in visualisation but the concept translates simply to additional dimensions as is common in Engineering data where we are often concerned with data values on 3D surfaces like temperature, and stress. See the plot of the car from Part 2.
Aside: Lightness
You will likely have heard of "RGB" color values. They are a coordinate system for color along the "red", "green" and "blue" axes. However, there are other ways of defining a color too. One such way is "HSL" or "Hue , Saturation, and Lightness". HSL is a cylindrical coordinate system where Hue represents the precise color in use, Saturation represents how much of a color is in another, and Lightness represents how "bright" the color is, or how close to white it is. Lightness is typically denoted with the letter "L".
In a heatmap, where a "change in color" means a "change in data", it is not immediately clear what a "change in color" would actually look like. Hue is probably the most unclear aspect to use because how do you rank colors? Alphabetically? By their place in the rainbow? What if the color isn't in the visible spectrum, like brown? This is too confusing. So what about Saturation? Saturation could work, but would restrict users to a single color per plot and would be totally useless if the plot was ever converted to black/white. Lightness, however, works quite nicely. There is a continuous spectrum from "light" to "dark" and multiple colors can be used, in principle. Reason dictates that as data increases, the lightness of the colormap should increase linearly. Similarly there should be a steady decrease in lightness with decreasing data.
This whole argument can be seen intuitively very easily. If you were given the two sets of color swatches below and told to "put them in order", what would you do? When you have multiple colors, it can be ambiguous what the "intended" association is, but brightness feels logical, especially when all the swatches can be laid out in a nicely ascending range of brightnesses, even if they have different hues and saturations.
So, hopefully you can see how the choice of color palette you use to represent your data is very important and can affect how the data is communicated. However, despite all this, one colormap in particular tends to get used more than any other. It is the hammer and when all you have is a hammer, everything starts to look like a nail. Except, it is also a bad hammer in this analogy, so I want to explain it first.
Jet - the jack of all trades that is bad at all of them
Jet, has been the default colormap of countless tools and plotting libraries since color plots were a thing. You may not recognise the name, but you HAVE seen this colormap before, if only from reading Parts 1-3 in this series of articles. To be fair to it, "jet" is actually just one variant on the "rainbow" color palette which tends to get used as a label for the category as well as individual palettes so even if you've never seen "jet" in use you will have seen a rainbow variant. Sadly, naming conventions are not particularly consistent on this in the field.
Unfortunately for everyone "jet" and the rainbow family of colormaps are almost entirely bad (with a couple of notable exceptions). Lightness does not change linearly with data and there are two significant (and sharp) peaks at cyan and yellow that can make it look like your data has significant contours where it does not. Plus, it doesn't look great for the colorblind either. In fact the list of problems with rainbow and jet is extensive; bashing jet in the DataViz community has been said to be a favourite pasttime alongside trashing pie charts. I won't go into the details here, but there is plenty of material out there you can find that explains why it sucks better than I could.
- The rainbow colormap
- The rainbow is dead... long live the rainbow!
- Subtleties of color Part 1 of 6
- The end of the rainbow
- End rainbow
- Rainbow color map distorts and misleads research in hydrology – guidance for better visualizations and science communication
- mpl colormaps
- Why scientists need to be better at data visualization
All of these links are available, along with other links, at the bottom of this article.
OK, so. how can you do it right? Well, first it is important to understand that within heatmaps there are a further three distinct types of data plot that require slightly different rules to make sure they are legible and present data in the best way possible. Where "jet" may have been liberally applied, it is better to consider these three types separately and for some cases the rainbow colormap isn't even a bad choice.
Sequential
Sequential data is simple. It is just increasing data. Percentages, mass, volume, these are all properties that start at some low value and ascend up to a high value, sometimes linearly, sometimes logarithmically. The optimal colormaps to use for data like this are those that only increase linearly in Lightness with data (for logarithmic data you can add a logarithmic scale to the color bar). These are the optimal colormaps for 2D plots.
- viridis
- cividis
- parula
The MATLAB default is Parula, but it's been copyrighted which inspired the people at Matplotlib to make their own, and they did, resulting in viridis
. There's even been a new development called cividis
in recent years which is essentially viridis
as seen by those with color vision deficiency. There are other perceptually-linear color palettes out there, but these three are the flagship examples around.
Divergent
Divergent data is slightly more complicated and is data that has a unique/important central value. For example, angle, or Temperature (Celsius or Fahrenheit), or height above sea level. These are all scales that cross 0, and so divergent colormaps peak at that central point instead of one end and consist of two colors. The most prominent of these is coolwarm
which runs from blue to red and is becoming common in Ansys.
Divergent colormaps are actually very good choices for 3D heatmaps as well. This is because in 3D rendering shadows often have to be applied to give a sense of depth to the user and this can obscure the, often extremely, dark ends of the palette. Divergent colormaps are essentially just two halves of a color palette bolted together and thus do not suffer from the same effects because they tend to be the two lighter halves!
In recent years a direct replacement for jet has been developed and released. This is known as "turbo" and is sufficiently similar to jet that many people are swapping over to use it. Ansys included!
Turbo, An Improved Rainbow Colormap for Visualization
Quantitative
Finally we have qualitative data. Sometimes also known as "discrete" data this is data that often can not be interpolated and is typically not continuous. Election results, and voting intention are among the most common that we see day-to-day. Jet actually works OK for this sort of data but it is rarely used for it as the color choices are generally much freer so you can either go for a very specific style or use colors with significance to the data itself. For example, in the UK election maps always use red for Labour, blue for Conservative, and yellow for the LibDems.
Resources
- The Fundamentals of Color: Hue, Saturation, And Lightness
- Choosing colormaps (seaborn)
- Choosing colormaps (matplotlib)
- A beginner's guide to colormaps in matplotlib
- Rainbow colormaps are not all bad
- The rainbow colormap
- The rainbow is dead... long live the rainbow!
- Subtleties of color Part 1 of 6
- The end of the rainbow
- End rainbow
- Rainbow color map distorts and misleads research in hydrology – guidance for better visualizations and science communication
- mpl colormaps
- Why scientists need to be better at data visualization
- Turbo, An Improved Rainbow Colormap for Visualization
The Series
This is the final article in a series of four articles on the importance of quality data visualization. Find links to the rest of the series here (as they are published):