Skip to content

Writing mdoc files

This page will explain how to write valid SerialEM mdoc files in Python using mdocfile.

The problem with dataframes

mdocfile returns a single pandas dataframe when reading files. This tabular representation is convenient for data exploration and analysis. Some global data is replicated across all sections to enable returning this simple dataframe but this makes the dataframe a bad model for the contents of a file.

Introduction to data models

The contents of a file are represented as a small set of pydantic models internally. These are simple classes containing data that provide guarantees about the types of those data based on type hints.

These models can be explicitly constructed and used to write an mdoc file.

Writing an mdoc file

In this section, we will write a simple mdoc file with data for two sections.

The attribute names for each model reflect those found in the SerialEM documentation. The expected types can be seen by inspecting the model definitions in data_models.py.

from pathlib import Path
from mdocfile.data_models import Mdoc, MdocGlobalData, MdocSectionData

# construct global data model
global_data = MdocGlobalData(
    DataMode=1,
    ImageSize=(3838, 3710),
    PixelSpacing=1.35,
    Voltage=300
)

# construct section data models
first_section = MdocSectionData(
    ZValue=0,
    TiltAngle=0,
    StagePosition=(0.25, -0.25),
    PriorRecordDose=0,
    ExposureDose=0.3,
    SubFramePath=Path('/images/first_image.tif'),
    DateTime='05-Nov-15  15:21:38',
    NumSubFrames=8,
)

second_section = MdocSectionData(
    ZValue=1,
    TiltAngle=3,
    StagePosition=(0.25, -0.25),
    PriorRecordDose=0.3,
    ExposureDose=0.3,
    SubFramePath=Path('/images/second_image.tif'),
    DateTime='05-Nov-15  15:22:38',
    NumSubFrames=8,
)

# construct mdoc model
mdoc = Mdoc(
    titles=[
        '[T = SerialEM: Digitized on EMBL Krios                       30-Nov-15  15:14:20    ]',
        '[T =     Tilt axis angle = 85.3, binning = 4  spot = 8  camera = 2]'
    ],
    global_data=global_data,
    section_data=[first_section, second_section]
)

# write out the file
with open('my_new_mdoc.mdoc', mode='w+') as file:
    file.write(mdoc.to_string())

The code above produces the following file:

DataMode = 1
ImageSize = 3838 3710
PixelSpacing = 1.35
Voltage = 300.0

[T = SerialEM: Digitized on EMBL Krios                       30-Nov-15  15:14:20    ]

[T =     Tilt axis angle = 85.3, binning = 4  spot = 8  camera = 2]

[ZValue = 0]
TiltAngle = 0.0
StagePosition = 0.25 -0.25
ExposureDose = 0.3
PriorRecordDose = 0.0
SubFramePath = /images/first_image.tif
NumSubFrames = 8
DateTime = 05-Nov-15  15:21:38

[ZValue = 1]
TiltAngle = 3.0
StagePosition = 0.25 -0.25
ExposureDose = 0.3
PriorRecordDose = 0.3
SubFramePath = /images/second_image.tif
NumSubFrames = 8
DateTime = 05-Nov-15  15:22:38