(updated 26. May)
I was officially selected for the Google summer of code 2017 under the project Develop XDMF format for visualisation and checkpointing. I am very happy to begin work on this project and to help FEniCS community solve the issues related to the IO.
We have had the very first Google hangout with the mentors on Friday 12th. We've spoken about the general picture of this project, i.e. how current state of XDMF IO is not able to both save for visualization and save for checkpointing computed results. Current checkpointing format is binary HDF5 heavy data. FEniCS has methods that could read and write the (HDF5) data, but a priori information about the function space (how to interpret the data) must be provided. On the other hand, Paraview itself has no idea how to interpret raw FEniCS data - so XDMF XML structure must be provided in order to visualize.
My OS is the LTS Xubuntu (Ubuntu + xfce) 16.04.2. I am using
CLion IDE from JetBrains. HW: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz, 12 GB RAM.
FEniCS is composed of several parts maintained by developers, namely
Python parts are installed as python modules - no need for C/C++ compilation. The only compiled part is the DOLFIN, umbrella which connects all the other parts and contains some important submodules (IO, visualisation, solvers handling, mesh handling, etc.).
My project is about IO (a dolfin part) so I won't edit any of python parts (hopefully :)).
There are some higher level user interface methods to write/read dolfin functions, meshes, mesh-value collections etc. Let me consider the basic example of writing a dolfin mesh to a XDMF file from python interface.
import dolfin as d mesh = d.UnitSquareMesh(16, 16) file = d.XDMFFile("mesh.xdmf") file.write(mesh, d.XDMFFile.Encoding_ASCII)
std::string filename and optionally MPI communicator. If no communicator is provided then
MPI_COMM_WORLD is used. XML handling in dolfin is via
pugi library, so it instantiates new empty pugi xml document. XDMFFile also registers few parameters into global dolfin parameter variable.
rewrite_function_mesh takes boolean value (true by default) and applies when time series is written. If true then mesh is stored for each time step, if false then only intial mesh is stored.
functions_share_mesh is by default false. If more functions are saved into the XDMFFile then they all share the same mesh for the time step and this should optimize file size.
The last parameters is (bool, default false)
flush_output, which allows for datasets to be saved after each time step. Could be used to inspect the result of computation while running. Performace cost is important here, turn to true only in debug cases.
mesh = d.UnitSquareMesh ... constructs dolfin Mesh of triangles on \([0,1] \times [0,1]\). Eventually,
void XDMFFile::write(const Mesh &mesh, Encoding encoding) is called.
(updated 22nd June)
For the first coding period blog post, let me review some of XDMF current functionality in DOLFIN and relate it to what needs to be done during my GSoC project.
I have discussed in the previous post how, e.g., the XDMF interface to store a
Mesh is used. The Mesh IO currently works fine in serial and parallel.
If you run the following code in fenics
import dolfin as d mesh = d.UnitSquareMesh(5, 5) file_ascii = d.XDMFFile("xdmf_ascii.xdmf") file_ascii.write(mesh, d.XDMFFile.Encoding_ASCII)
it produces the following single
<?xml version="2.0"?> <!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" > <Xdmf Version="3.0" xmlns:xi="http://www.w3.org/2001/XInclude"> <Domain> <Grid Name="mesh" GridType="Uniform"> <Topology NumberOfElements="8" TopologyType="Triangle" NodesPerElement="3"> <DataItem Dimensions="8 3" NumberType="UInt" Format="XML"> 0 1 4 0 3 4 1 2 5 1 4 5 3 4 7 3 6 7 4 5 8 4 7 8 </DataItem> </Topology> <Geometry GeometryType="XY"> <DataItem Dimensions="9 2" Format="XML"> 0 0 0.5 0 1 0 0 0.5 0.5 0.5 1 0.5 0 1 0.5 1 1 1 </DataItem> </Geometry> </Grid> </Domain> </Xdmf>
XDMF starts with the header definition and then encapsulates
<Topology> describes connectivity of each cell (triangles in the case of
UnitSquareMesh). Basically, the topology node contains n-tuple for each cell, which number the vertices of that cell. This numbering then applies to the
<Geometry> tag, where coordinates of points are present. For instance, the first cell has vertices numbered 0,1,4 and these vertices have coordinates (0, 0), (0.5, 0), (0.5, 0.5).
If we read this XDMF3 file into ParaView, we arrive at
This was a simple idea, how the Mesh IO works.
My project is mostly about improving the IO for Functions. Current XDMF could save the Function of any element type - but for higher order functions and discontinuous functions it makes some crude local interpolation which decimates the data - so it cannot be fully read back again to fenics. The reason for this is, that XDMF in fenics was meant to be used for visualization mainly and one can truly visualize only "linear" elements with given values at cell vertices.
Let me save the function
on a unit square triangular mesh to a current XDMF format. The code looks like
import dolfin as d mesh = d.UnitSquareMesh(2, 2) file_ascii = d.XDMFFile("xdmf_ascii.xdmf") V = d.FunctionSpace(mesh, "CG", 1) f = d.Function(V) f.assign(d.project(d.Expression("x * x", degree=2), V)) file_ascii.write(f, d.XDMFFile.Encoding_ASCII)
and resulting file contains apart from the geometry and topology data the following tag
<Attribute Name="f_4" AttributeType="Scalar" Center="Node"> <DataItem Dimensions="9 1" Format="XML"> 1.238011374155259e-15 2.053478914687549e-15 1.977584762613556e-15 1.986258379993447e-15 0.2500000000000034 0.5000000000000033 1.951563910473904e-15 0.5000000000000032 1.000000000000005 </DataItem> </Attribute>
These are values of function
at vertices and are easy to visualize. Doing so in ParaView gives the following picture
Although this looks nice, the higher order functions (piecewise quadratic) in current XDMF are stored only as piecewise linear functions (vertex values).
The main objective of the first coding period is to develop parallel write and read functions for XDMF, which store full function-related information = degrees of freedom mapping, values of degrees of freedom and some representation of FiniteElement used.
With the partial support of NumFOCUS I've attended FEniCS 2017 conference hosted in the capital of Luxembourg, 12th-15th June. I've had the pleasure to meet my mentors(C. Richardson and J. Hale) in person and discuss with them my GSoC project.
(left) Ivan Yaschuk, Chris Richardson and (right) me.
Back to work now. I've added public methods into
XDMFFile class, namely
read_checkpoint. They are used to save and read
Function. Underlying data structure is the same as in existing (and by user tested)
HDF5File. Instead of storing vertex values (as described above) I store full information, i.e. cell ordering, number of degrees of freedom per cell, degrees of freedom mapping and actual values of degrees of freedom.
When user wants to read back (checkpoint) a Function, he must specify an empty
Function object where new values are be loaded.
I've also tried my mentor's ParaView filters - have a look here. Since they are designed to a bit different format of XDMF output as I am preparing, I've made slight change to it.
Afterall, functionality, which was not possible in FEniCS so far is e.g. to write in parallel (I did on 4 processes) DG2 function (piecewise polynomial of degree 2 but discontinuous through elements), read it back, change it and visualize in ParaView.
A nice plot of a time-series of DG2 function due to new functionality:
(updated 25th July)
The first coding period is over and I have my code merged to the master FEniCS branch! See here.
The current state of XDMF FEniCS side functionality is more-or-less fine. We can save functions in parallel and read them back.
There is an ongoing discussion about adding new attributes to XDMF format that would fit our current FEniCS output model. You can track the discussion here. Our ultimate goal is to append the XDMF3 standard and change the
VTK's understanding of XDMF.
VTK (Visualization Toolkit) is a toolkit maintained by Kitware (www.kitware.com) to read and write popular data file formats. It also provides some own file formats. Nice description of VTK's file formats is here http://www.vtk.org/wp-content/uploads/2015/04/file-formats.pdf.
They have a gitlab repository at https://gitlab.kitware.com/vtk/vtk/ and ParaView, Kitware's visualization software is based on VTK data model. In addition, whatever change we make to the XDMF standard it must map to the current VTK model.
We have agreed on a format for (FEniCS's) finite element function written to XDMF. Preliminarily the format is
<Grid Name="A grid"> <Topology></Topology> <Geometry></Geometry> <Attribute Name="Function_0" Center="Other" possibly_other_identifiers="" ItemType="FiniteElementFunction" ElementFamily="DG" ElementDegree="2"> <DataItem Name="Index"><!-- first dataitem used for indices (dofmap)--></DataItem> <DataItem Name="Value"><!-- second dataitem used for values--></DataItem> </Attribute> <Attribute Name="Function_1" Center="Other" possible_other_identifiers="" ItemType="FiniteElementFunction" ElementFamily="DG" ElementDegree="2"> <DataItem Name="Index"><!-- first dataitem used for indices (dofmap)--></DataItem> <DataItem Name="Value"><!-- second dataitem used for values--></DataItem> </Attribute> </Grid>
<Geometry> tags the mesh connectivity and mesh point coordinates are saved. Novelty cames in
ItemType attribute of
Attribute, where you can specify
FiniteElementFunction and consequently it's family and degree in
ElementFamily, ElementDegree. Inside the attribute there are at least two
DataItem. First - by the position - there is dataitem containing indices for values of freedom. These are just integer coordinates pointing to the second dataitem. The second dataitem contains values of degrees of freedom (float numbers). I will call them dofmap and dofvalues dataitems.
Dofmap is partitioned into tuples for each cell, i.e. dofmap dataitem containing
0 1 3 0 3 2
means there are two cells - each containing three degrees of freedom and together with dofvalues
0 1 0 1
we can say, that 0-th cell has three dofs with values 0, 1, 0 and 1-st cell has three dofs with values 0, 1, 0.
My latest achievement is related to
VTK libraries. In order to make ParaView read and visualize this new FiniteElementFunction format I had to make XDMF cpp library understand it. This work is in branch https://gitlab.kitware.com/michalhabera/xdmf/commits/michal/add-finite-element-function. Main difficulty comes in reading
Attribute with more than one
DataItems. The problem is, that
Attribute is in cpp library a child of
DataItem - so it can naturally "contain" only one dataitem. My workaround to this applies when the attribute is being populated with children. Instead of populating two dataitems (which is impossible), I parse dofmap and dofvalues dataitem into one. This is done simply constructing new dataitem with values symbolically written in array notation
dofvalues[dofmap]. From the point of view of VTK, which sees only the resulting populated structure, there is only one dataitem under finite element function attribute. This dataitem contains correctly arranged values of degrees of freedom.
In VTK cpp library there are at least two ways how to build unstructured grid. First way is to prepare connectivity and geometry as a whole and aside and then set is to
vtkUnstructuredGrid. Second way is to append
vtkUnstructuredGrid cell by cell using
InsertNextCell and point by point using
InsertNextPoint. These two approaches differ. The first is more optimized and stores only one value per node, but the second is capable of storing discontinuous functions (because they have more values for the same node).
In summary, I have made XDMF to parse multiple dataitems under one attribute and I have also changed the way VTK builds unstructured grid from XDMF grid according to python filters prepared by my mentor. Paraview can with my XDMF and VTK branches natively (no python filters) read our new XDMF checkpointing format for the simplest functions (CG1, DG1 on triangles).
(updated 18th August)
The last coding period is in its peak and this is my first blog post in August.
I have finally (with the help of my mentor David DeMarle) addressed the "multiple dataitems" issue mentioned in previous post. In other words, the new logic is no more spread between both libraries - XDMF and VTK, but is lifted only to the VTK level.
I have also implemented special type of finite element function - belonging to so called Raviart-Thomas space. This is, one can say, the first nontrivial function space visualized. Degrees of freedom for RT1 (of degree=1) functions on triangles are situated in midpoints of triangle edges and they specify normal component of the vector field. The vector field is continuous inside each triangle, normal components are constant on each edge and are continuous through edges. This defines uniquely a vector function and it is possible to compute its values at nodes. These values could be used to visualize a piecewise discontinuous vector function in VTK/ParaView. My results are shown the following picture.
The y-component of vector field (x, 0) interpolated to RT1 space is shown. New plot on right is compared to the old way of plotting (the old way consisted of interpolation into continuous function and its visualization).
Several other function spaces were added (continuous and discontinuous functions on quadrilaterals - this is possible with the work of another GSoC FEniCS's student Ivan Yashchuk).
Now I am testing the work and preparing final merge/pull requests to the XDMF and VTK libraries.