Skip to content

Overview

starfile is a package for reading and writing STAR files in Python.

A very simple example

reading and writing a STAR file with a single data block

starfile can be used interactively to inspect/explore files or in scripts and larger software packages to provide basic STAR file I/O functions. Data is exposed as simple python dictionaries or pandas dataframes.

This package was designed principally for compatibility with files generated by RELION. For more information on working with pandas, please see the pandas docs.


Quickstart

For the following file particles.star with a single data block

data_particles

loop_
_rlnCoordinateX #1
_rlnCoordinateY #2
_rlnCoordinateZ #3
_rlnAngleRot #4
_rlnAngleTilt #5
_rlnAnglePsi #6
_rlnMicrographName #7
91.798700   83.622600   203.341030  -51.740000  173.930000  32.971000   01_10.00Apx.mrc
97.635800   80.437000   203.136160  141.500000  171.760000  -134.680000 01_10.00Apx.mrc
92.415200   88.842700   210.663900  -78.750000  173.930000  87.263200   01_10.00Apx.mrc
94.607830   93.135410   205.425960  -85.215000  167.170000  85.632200   01_10.00Apx.mrc
86.187800   80.125400   204.558750  14.910000   163.260000  -16.030000  01_10.00Apx.mrc
91.824240   76.738300   203.794280  39.740000   168.410000  -57.250000  01_10.00Apx.mrc
98.253300   73.530100   203.856030  73.950000   166.380000  -84.640000  01_10.00Apx.mrc
101.303500  80.290800   194.790400  -178.878000 166.090000  73.181000   01_10.00Apx.mrc

Read the file

import starfile

df = starfile.read('particles.star')

Interact with the data

df['rlnCoordinateX'] += 10
df.head()
   rlnCoordinateX  rlnCoordinateY  rlnCoordinateZ  rlnAngleRot  rlnAngleTilt  rlnAnglePsi rlnMicrographName
0       101.79870        83.62260       203.34103      -51.740        173.93      32.9710   01_10.00Apx.mrc
1       107.63580        80.43700       203.13616      141.500        171.76    -134.6800   01_10.00Apx.mrc
2       102.41520        88.84270       210.66390      -78.750        173.93      87.2632   01_10.00Apx.mrc
3       104.60783        93.13541       205.42596      -85.215        167.17      85.6322   01_10.00Apx.mrc
4        96.18780        80.12540       204.55875       14.910        163.26     -16.0300   01_10.00Apx.mrc

Save the (modified) data to file

starfile.write(df, 'modified_particles.star')

For more advanced usage please check out the examples.


Installation

pip install starfile

API

starfile.read()

Read data from a STAR file.

Basic data blocks are read as dictionaries. Loop blocks are read as pandas dataframes. When multiple data blocks are present a dictionary of datablocks is returned. When a single datablock is present only the block is returned by default. To force returning a dectionary even when only one datablock is present set always_dict=True.

Parameters:

Name Type Description Default
filename PathLike

File from which to read data.

required
read_n_blocks Optional[int]

Limit reading the file to the first n data blocks.

None
always_dict bool

Always return a dictionary, even when only a single data block is present.

False
parse_as_string List[str]

A list of keys or column names which will not be coerced to numeric values.

[]
Source code in src/starfile/functions.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def read(
    filename: PathLike,
    read_n_blocks: Optional[int] = None,
    always_dict: bool = False,
    parse_as_string: List[str] = []
) -> Union[DataBlock, Dict[DataBlock]]:
    """Read data from a STAR file.

    Basic data blocks are read as dictionaries. Loop blocks are read as pandas
    dataframes. When multiple data blocks are present a dictionary of datablocks is
    returned. When a single datablock is present only the block is returned by default.
    To force returning a dectionary even when only one datablock is present set
    `always_dict=True`.

    Parameters
    ----------
    filename: PathLike
        File from which to read data.
    read_n_blocks: int | None
        Limit reading the file to the first n data blocks.
    always_dict: bool
        Always return a dictionary, even when only a single data block is present.
    parse_as_string: list[str]
        A list of keys or column names which will not be coerced to numeric values.
    """
    parser = StarParser(filename, n_blocks_to_read=read_n_blocks, parse_as_string=parse_as_string)
    if len(parser.data_blocks) == 1 and always_dict is False:
        return list(parser.data_blocks.values())[0]
    else:
        return parser.data_blocks

starfile.write()

Write data to disk in the STAR format.

Parameters:

Name Type Description Default
data Union[DataBlock, Dict[str, DataBlock], List[DataBlock]]

Data to be saved to file. DataBlocks are dictionaries or dataframes. If a dictionary of datablocks are passed the keys will be the data block names.

required
filename PathLike

Path where the file will be saved.

required
float_format str

Float format string which will be passed to pandas.

'%.6f'
sep str

Separator between values, will be passed to pandas.

'\t'
na_rep str

Representation of null values, will be passed to pandas.

'<NA>'
Source code in src/starfile/functions.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def write(
    data: Union[DataBlock, Dict[str, DataBlock], List[DataBlock]],
    filename: PathLike,
    float_format: str = '%.6f',
    sep: str = '\t',
    na_rep: str = '<NA>',
    quote_character: str = '"',
    quote_all_strings: bool = False,
    **kwargs
):
    """Write data to disk in the STAR format.

    Parameters
    ----------
    data: DataBlock | Dict[str, DataBlock] | List[DataBlock]
        Data to be saved to file. DataBlocks are dictionaries or dataframes.
        If a dictionary of datablocks are passed the keys will be the data block names.
    filename: PathLike
        Path where the file will be saved.
    float_format: str
        Float format string which will be passed to pandas.
    sep: str
        Separator between values, will be passed to pandas.
    na_rep: str
        Representation of null values, will be passed to pandas.
    """
    StarWriter(
        data,
        filename=filename,
        float_format=float_format,
        na_rep=na_rep,
        separator=sep,
        quote_character=quote_character,
        quote_all_strings=quote_all_strings,
    ).write()

starfile.to_string()

Represent data in the STAR format.

Parameters:

Name Type Description Default
data Union[DataBlock, Dict[str, DataBlock], List[DataBlock]]

Data to represent. DataBlocks are dictionaries or dataframes. If a dictionary of datablocks are passed the keys will be the data block names.

required
float_format str

Float format string which will be passed to pandas.

'%.6f'
sep str

Separator between values, will be passed to pandas.

'\t'
na_rep str

Representation of null values, will be passed to pandas.

'<NA>'
Source code in src/starfile/functions.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def to_string(
    data: Union[DataBlock, Dict[str, DataBlock], List[DataBlock]],
    float_format: str = '%.6f',
    sep: str = '\t',
    na_rep: str = '<NA>',
    quote_character: str = '"',
    quote_all_strings: bool = False,
    **kwargs
):
    """Represent data in the STAR format.

    Parameters
    ----------
    data: DataBlock | Dict[str, DataBlock] | List[DataBlock]
        Data to represent. DataBlocks are dictionaries or dataframes.
        If a dictionary of datablocks are passed the keys will be the data block names.
    float_format: str
        Float format string which will be passed to pandas.
    sep: str
        Separator between values, will be passed to pandas.
    na_rep: str
        Representation of null values, will be passed to pandas.
    """
    writer = StarWriter(
        data,
        filename=None,
        float_format=float_format,
        na_rep=na_rep,
        separator=sep,
        quote_character=quote_character,
        quote_all_strings=quote_all_strings,
    )
    return ''.join(line + '\n' for line in writer.lines())