Overview
starfile is a package for reading and writing STAR files in Python.
starfile can be used interactively to inspect/explore files or in scripts and larger software packages to provide basic STAR file I/O functions. Data is exposed as simple python dictionaries or pandas dataframes.
This package was designed principally for compatibility with files generated by RELION. For more information on working with pandas, please see the pandas docs.
Quickstart
For the following file particles.star
with a single data block
data_particles
loop_
_rlnCoordinateX #1
_rlnCoordinateY #2
_rlnCoordinateZ #3
_rlnAngleRot #4
_rlnAngleTilt #5
_rlnAnglePsi #6
_rlnMicrographName #7
91.798700 83.622600 203.341030 -51.740000 173.930000 32.971000 01_10.00Apx.mrc
97.635800 80.437000 203.136160 141.500000 171.760000 -134.680000 01_10.00Apx.mrc
92.415200 88.842700 210.663900 -78.750000 173.930000 87.263200 01_10.00Apx.mrc
94.607830 93.135410 205.425960 -85.215000 167.170000 85.632200 01_10.00Apx.mrc
86.187800 80.125400 204.558750 14.910000 163.260000 -16.030000 01_10.00Apx.mrc
91.824240 76.738300 203.794280 39.740000 168.410000 -57.250000 01_10.00Apx.mrc
98.253300 73.530100 203.856030 73.950000 166.380000 -84.640000 01_10.00Apx.mrc
101.303500 80.290800 194.790400 -178.878000 166.090000 73.181000 01_10.00Apx.mrc
Read the file
import starfile
df = starfile.read('particles.star')
Interact with the data
df['rlnCoordinateX'] += 10
df.head()
rlnCoordinateX rlnCoordinateY rlnCoordinateZ rlnAngleRot rlnAngleTilt rlnAnglePsi rlnMicrographName
0 101.79870 83.62260 203.34103 -51.740 173.93 32.9710 01_10.00Apx.mrc
1 107.63580 80.43700 203.13616 141.500 171.76 -134.6800 01_10.00Apx.mrc
2 102.41520 88.84270 210.66390 -78.750 173.93 87.2632 01_10.00Apx.mrc
3 104.60783 93.13541 205.42596 -85.215 167.17 85.6322 01_10.00Apx.mrc
4 96.18780 80.12540 204.55875 14.910 163.26 -16.0300 01_10.00Apx.mrc
Save the (modified) data to file
starfile.write(df, 'modified_particles.star')
For more advanced usage please check out the examples.
Installation
pip install starfile
pip install starfile
API
starfile.read()
Read data from a STAR file.
Basic data blocks are read as dictionaries. Loop blocks are read as pandas
dataframes. When multiple data blocks are present a dictionary of datablocks is
returned. When a single datablock is present only the block is returned by default.
To force returning a dectionary even when only one datablock is present set
always_dict=True
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename
|
PathLike
|
File from which to read data. |
required |
read_n_blocks
|
Optional[int]
|
Limit reading the file to the first n data blocks. |
None
|
always_dict
|
bool
|
Always return a dictionary, even when only a single data block is present. |
False
|
parse_as_string
|
List[str]
|
A list of keys or column names which will not be coerced to numeric values. |
[]
|
Source code in src/starfile/functions.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
starfile.write()
Write data to disk in the STAR format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Union[DataBlock, Dict[str, DataBlock], List[DataBlock]]
|
Data to be saved to file. DataBlocks are dictionaries or dataframes. If a dictionary of datablocks are passed the keys will be the data block names. |
required |
filename
|
PathLike
|
Path where the file will be saved. |
required |
float_format
|
str
|
Float format string which will be passed to pandas. |
'%.6f'
|
sep
|
str
|
Separator between values, will be passed to pandas. |
'\t'
|
na_rep
|
str
|
Representation of null values, will be passed to pandas. |
'<NA>'
|
Source code in src/starfile/functions.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
starfile.to_string()
Represent data in the STAR format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Union[DataBlock, Dict[str, DataBlock], List[DataBlock]]
|
Data to represent. DataBlocks are dictionaries or dataframes. If a dictionary of datablocks are passed the keys will be the data block names. |
required |
float_format
|
str
|
Float format string which will be passed to pandas. |
'%.6f'
|
sep
|
str
|
Separator between values, will be passed to pandas. |
'\t'
|
na_rep
|
str
|
Representation of null values, will be passed to pandas. |
'<NA>'
|
Source code in src/starfile/functions.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|