Advanced visualizations
Contents
Advanced visualizations¶
Authors: Enze Chen and Mark Asta (University of California, Berkeley)
Note
This is an interactive exercise, so you will want to click the and open the notebook in DataHub (or Colab for non-UCB students).
Learning objectives¶
This notebook contains a series of demos and suggestions for things to try next. Read another way, it’s very half-baked because Enze misjudged how much time he actually had. These exercises are optional, meant for those who have finished everything else; although everyone might find these exercises helpful later for self-directed research. But no seriously this is optional and you shouldn’t feel like you’re missing out on anything by not doing it.
Import Python modules¶
import os
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from helper_funcs_viz import get_api_key
plt.rcParams.update({'figure.figsize':(8,6), # Increase figure size
'font.size':22, # Increase font size
'mathtext.fontset':'cm', # Change math font to Computer Modern
'mathtext.rm':'serif', # Documentation recommended follow-up
'lines.linewidth':4, # Thicker plot lines
'lines.markersize':12, # Larger plot points
'axes.linewidth':2, # Thicker axes lines (but not too thick)
'xtick.direction':'in',
'ytick.direction':'in',
'xtick.major.size':8, # Make the x-ticks longer (our plot is larger!)
'xtick.major.width':2, # Make the x-ticks wider
'ytick.major.size':8, # Ditto for y-ticks
'ytick.major.width':2}) # Ditto for y-ticks
Interactive visualizations¶
So far we’ve talked about some of the basic visualizations, which already were pretty impressive. With some custom styling tailored to our data and use case, those plots really shined.
But, maybe those weren’t exciting enough because they were static plots that you couldn’t interact with.
There are many Python packages out there for creating interactive content, and one of the most popular ones is Plotly.
Again, rather than describe what Plotly allows you to do, we’ll just show you, by plotting a PhaseDiagram
object from Pymatgen.
Try hovering over the points and dragging the figure around. 🙂
from pymatgen.ext.matproj import MPRester
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter
api_key = get_api_key()
with MPRester(api_key) as m:
entries = m.get_entries_in_chemsys(['Ca', 'C', 'O'])
phase_diagram = PhaseDiagram(entries)
pltr = PDPlotter(phase_diagram)
pltr.show()
Dielectric constants dataset¶
We’ve discussed how tabular data formats (CSV) could be an advantage over hierarchical data formats (JSON) due to ease of computing summary statistics.
Another advantage is in data visualization.
In this series of exercises, we’ll work with the tabular form of the dielectric constants dataset, which hopefully you were able to finish as part of the Tabular data notebook.
If not, no worries, there should be a completed dataset lying somewhere in the assets
folder. 😉
Here are ideas for a few things you could plot with that data as-is:
Exercise: Create a scatter plot of refractive index vs. band gap, just like their Figure 5.¶
Pay attention to the axes scales!
# ------------- WRITE YOUR CODE IN THE SPACE BELOW ---------- #
Exercise: Create a scatter plot of the other properties, maybe electronic vs. total dielectric constant?¶
# ------------- WRITE YOUR CODE IN THE SPACE BELOW ---------- #
Exercise: Create a histogram of any of the properties, maybe total dielectric constants?¶
Similar to Figure 3 in the Petousis paper, but we’ll go with histograms instead of violin plots.
# ------------- WRITE YOUR CODE IN THE SPACE BELOW ---------- #
Extension¶
You’ll notice that a differentiator between the plots you created above and the ones in the paper is the separation based on crystal structure. One way to “easily” figure out the crystal structure of a material is with the space group number. But… we don’t have the space group number… yet.
Step 1: Expand your dataset by getting the space group number of each material. You can do this in many ways, and we suggest two possible paths:
You can look in the original dataset (JSON file) to try and extract the space group number, which is buried in the file. This is a good exercise.
OR you can use Pymatgen to construct a query for the space group of all of your materials (possible with a criteria on the material ID). This is also a good exercise.
You can add the space group number to your DataFrame / tabular dataset of the dielectric constants. Then save this dataset into a file since you’ve improved it.
Step 2: After this, what you then need is a function that can convert the space group number into the crystal system. This should be very similar to a function that you’ve already written on the first day, so no more hints on this part. 😉
Step 3: Finally, to plot each set of crystal structures separately (say, a different color for each one on a scatter plot), we can group the data in the DataFrame by crystal structure using the df.groupby()
method.
You might want to use it in this way:
groups = df.groupby(['crystal_structure'])
for name, group in groups:
ax.scatter(group['band_gap'], group['refractive_index'], label=name)
Good luck!
It must be said that this particular extension has a lot of new ideas thrown at you all at once. What we recommend you pay attention to is not finishing all the steps, but rather why we chose this sequence of steps, and how we broke down a complex problem (here’s a plot, go make it!) into a series of more manageable steps. Note that we started with something that we already had access to, and then brought in more data as we needed. You might find yourself doing something similar, but not exactly the same in your own work, so it’s much more important that you mirror the thought process, rather than the concrete code.
# ------------- WRITE YOUR CODE IN THE SPACE BELOW ---------- #
Conclusion¶
Please don’t hesitate to reach out on Slack if you have questions or concerns about this or any other content.