Competing Interests: The authors have declared that no competing interests exist.
Single molecule localization microscopy (SMLM) has enormous potential for resolving subcellular structures below the diffraction limit of light microscopy: Localization precision in the low digit nanometer regime has been shown to be achievable. In order to record localization microscopy data, however, sample fixation is inevitable to prevent molecular motion during the rather long recording times of minutes up to hours. Eventually, it turns out that preservation of the sample’s ultrastructure during fixation becomes the limiting factor. We propose here a workflow for data analysis, which is based on SMLM performed at cryogenic temperatures. Since molecular dipoles of the fluorophores are fixed at low temperatures, such an approach offers the possibility to use the orientation of the dipole as an additional information for image analysis. In particular, assignment of localizations to individual dye molecules becomes possible with high reliability. We quantitatively characterized the new approach based on the analysis of simulated oligomeric structures. Side lengths can be determined with a relative error of less than 1% for tetramers with a nominal side length of 5 nm, even if the assumed localization precision for single molecules is more than 2 nm.
In the last decade, super-resolution microscopy techniques have paved the way for resolving cellular structures in unprecedented detail [1]. The assembly of biomolecules at the nanoscale plays a crucial role in their functionality and hence, is key to our understanding of cellular processes. In particular, the technique of single molecule localization microscopy (SMLM) appears well suited for structural biology, as it is based on localization coordinates of individual molecules rather than on pixelated images of recorded fluorescence intensities. In SMLM, dye molecules are linked to the biomolecule of interest and imaged under conditions, where only a small subset of dye molecules is visible at any time-point. From the movies containing thousands of images of the very same region, one can determine the positions of these dye molecules very accurately down to a precision of a few nanometers [2], which allows for establishing localization maps.
The increased spatial resolution of SMLM, however, comes at the cost of temporal resolution, as image acquisition takes several minutes up to hours. Thorough sample fixation is thus a crucial prerequisite for high resolution SMLM recordings, as any residual diffusion of molecules [3] will lead to distortions of the obtained localization maps. Since such residual motion is likely uncorrelated within the sample, it cannot be corrected by standard drift correction methods. Importantly, the chosen fixation method further needs to conserve the ultrastructure of the sample under investigation, which is typically not the case using chemical fixatives [4]. Novel cryo-fixation approaches [5] combined with SMLM at cryogenic temperatures (cryo-SMLM) [6–8] promise to resolve both points, thereby opening up SMLM to questions from structural biology.
Two aspects of SMLM, however, hamper the direct ultrastructural interpretation of localization maps: On the one hand, insufficient labeling and/or detection efficiency leads to undercounting; on the other hand, multiple detections of individual molecules result in overcounting [9]. Therefore, some parts of a particular biomolecular structure may not be visible at all, while other parts may be heavily overrepresented.
In principle, particle averaging approaches allow for circumventing the issue of statistical distortions in SMLM. Similarly to single particle reconstruction methods used in cryo-electron microscopy (cryo-EM), hundreds to thousands of identical copies of the same particle are imaged, and subsequently combined to yield an averaged super-particle [10, 11]. In case of unknown structures template-free registration methods have to be employed. Two possible approaches are pyramid registration, where particles are registered pairwise in consecutive steps [12], or all-to-all registration, where all particles are registered to all others simultaneously [13]. Any knowledge of particle symmetry may be included in the registration process in order to increase the quality of the reconstruction [13]. To improve the registration process under realistic imaging conditions, the Bachttacharyya distance allows to account for missing labels, different number of localizations of individual molecules and anisotropic localization uncertainty [12, 13]. In addition, for accurate reconstruction of semi-flexible structures, Shi et al. recently suggested an approach for deformed alignment [14]. Note that up to now these approaches were successfully applied only in case of rather large structures with sizes of tens of nanometers [14], or imaging conditions yielding tens to hundreds of localizations per label site [13]. Quite often, however, the cell-biological context of an experiment is in conflict with these requirements, in many cases impeding particle reconstruction. In such cases, template-based registration methods may recover superresolution analysis, or provide superior results. In principle, template-based registration allows to register the point sets acquired from each particle onto the template. In a pioneering study, Szymborska et al. [15] used a circular template to study the arrangement of molecules in the nuclear pore complex (NPC), allowing the determination of its radius with a precision of 0.1 nm. More elaborate analysis employing the eight-fold symmetry of the NPC allowed to analyze the single-molecule labeling efficiency [16] or reconstruct a more detailed view of the NPC structure [12].
As an alternative to coordinate-based registration, reconstruction can be performed based on algorithms developed for cryo-EM data. In this case, the obtained localization maps first need to be converted to localization images (e.g. based on localization densities or localization uncertainties), since EM-algorithms expect continuous intensity distributions instead of a list of coordinates. Using this approach combined with imaging at cryogenic temperatures, Weisenburger et al. reported a resolution on the Ångström scale for imaging of the GtCitA Pasc domain dimer and the streptavidin homotetramer [8]. Of note, imaging modalities of EM and SMLM differ quite substantially and EM-algorithms might not fully account for SMLM specifics.
Currently, however, it is difficult to assign localizations to specific dye molecules. We propose here a new approach for the analysis of oligomeric protein complexes, which is tailored to the conditions of cryo-SMLM. Measuring at cryogenic temperatures has two key advantages, which shall be exploited here: first, it ensures supreme fixation and conservation of the sample’s ultrastructure [5]; second, also rotational diffusion of the fluorophores’ excitation and emission dipoles during illumination is prevented at least over time-scales of hours [8]. The second aspect allows for establishing a unique characteristic for each dye molecule, based on the orientation of its dipole moment at the time point of freezing. In this paper, we propose to infer this characteristic from imaging sequences, in which samples are alternately excited with linearly polarized light with polarization vectors rotated by 90° (Fig 1A). Thereby, assignment of localizations to individual molecules becomes possible, which substantially enhances fitting results. We showcase the performance of the approach by determining the size of regular oligomeric structures, based on the analysis of thousands of simulated oligomers.


Schematic representation of the method.
(A) The dipole orientation of a fluorophore is shown as green arrow, which is defined by the azimuthal angle φ and the elevation angle θ. Without loss of generality, the optical axis shall be parallel to the z-axis. The fluorophore shall be excited alternately with linearly polarized light of polarization direction along the x- and y- axis. (B) The expected single molecule brightness values for x- and y-polarized light (Nx and Ny, respectively) are shown for four exemplary dipole orientations, assuming Nmax = 20000. Localizations corresponding to the same fluorophore are plotted in the same color. Annotated numbers indicate (ϕ, θ). (C) Exemplary localization map as it could be obtained for a tetramer with a side length of 5 nm. The actual positions of the dye molecules are shown as crosses, the circumscribed circle around the actual tetramer as dotted line. The assignment shown in panel B was used to group localizations, indicated by the same color code. The average position of each localization group is plotted as open dot. The dashed and solid lines show the uncorrected and corrected fitting results for the circumscribed circle (Eqs (10) and (39)). (D) Fitting results of the same localization map as in panel C, but without assignment of localizations to individual dye molecules, yields much worse results (dashed line).
In this manuscript, we consider the analysis of oligomeric protein structures consisting of n protomers, which can be represented by regular polygons consisting of n corners. If not mentioned otherwise, we consider tetramers with a side length of 5 nm. Each protomer shall be labeled by exactly one dye molecule. This can be achieved experimentally, e.g. by using tags or unnatural amino acids as labels [17, 18]. The aim of our study is to develop a template-based analysis approach to determine the distances between individual protomers, by making use of the correct assignment of localizations to individual protomers. The latter shall be enabled by exploiting the linear dichroism observable in the signal molecule brightness, when fixed dipoles are recorded with linearly polarized light.
At cryogenic temperatures, the dipole orientation of a fluorophore is fixed. When exciting such a fluorophore with linearly polarized light, the absorption probability depends on the scalar product between the fluorophore’s dipole orientation and the polarization vector of the excitation light (see Eqs (2) and (3)). Exciting the fluorophore consecutively with light of orthogonal polarization directions parallel to the x- and y-axis, respectively (Fig 1A), yields characteristic brightness changes depending on the fluorophore’s dipole orientation. Note that the orientation of the x, y-coordinate system in the image plane can be arbitrary. Here and in the following, we used an analytical representation of the number of localizations per molecule m, which closely reflects experimental data (see Methods/Simulations); for convenience, we used here data recorded at room temperature. For the single molecule brightness we considered a maximum number of photons per single molecule signal, Nmax, as it would be recorded if the dye’s dipole moment was aligned with the polarization of the excitation light. The actual signals, as they would be recorded for arbitrary dipole angles, were calculated according to Eqs (2) and (3), and were subjected to photon shot noise.
Heydarian and colleagues published a template-free approach to analyze the underlying structure of an unknown oligomer based on the obtained localization maps [13]. For high single molecule localization precision characterized by Nmax = 105 photons, the method indeed yields satisfactory results and clearly reveals the tetrameric arrangement of the individual dye molecules (see S1C Fig in S1 File). With decreasing photon numbers and increasing localization error, however, localization maps become more difficult to analyze; eventually at Nmax = 104 photons per dye molecule, no substructures can be identified. In our manuscript we propose to additionally include information about the assignment of localizations to the individual dye molecules, which becomes available when performing the experiment at cryogenic temperature. As we will show in the following, this assignment not only allows to tackle challenging imaging conditions at low photon numbers, it also yields highly precise estimates of the oligomer size.
In Fig 1B we plot the signal intensities Nx, Ny for the two polarization directions for four exemplary fluorophore dipole orientations, as they could occur for a fully labelled tetramer. In this case, discrimination of the four dye molecules is straightforward, and we can group all localizations that belong to each single molecule (indicated by color in Fig 1B and 1C) (see Methods section Assignment of Blinks to Specific Molecules). In principle, brightness values can cover the whole region (S2 Fig in S1 File) confined by Nx > 0, Ny > 0 and Nx + Ny < Nmax, with a slight dip in the center of the region. We only accepted sufficiently bright signals with Nx + Ny ≥ Nmin, which would yield a localization error below a user-defined threshold Δx (see Methods section Simulations for the relation between Δx and Nmin, and S2B Fig in S1 File). Note that the point clouds corresponding to each dye molecule can be elliptically distorted due to differences in the Poisson noise along the x- and y-axis (see the red point cloud in Fig 1B).
For convenience, we assumed throughout our manuscript the following procedure for determining the localization of single molecule signals: The two recordings corresponding to the two polarization channels are added up, irrespective of the signal intensities in the two channels, and the localizations are determined on the sum image. Considering the situation of fixed dipole moments, a substantial fraction of molecules will show dipoles characterized by an elevation angle close to the optical axis. Such molecules will produce rather faint signals, which yield large localization errors. In consequence, a rather broad distribution of localization errors can be expected. Of note, we assumed here subsequent illumination of the sample with different polarizations but detection of the two corresponding images on the very same region of the camera chip; hence, no image registration problems occur.
In Fig 1C and 1D we show the obtained localization map of the exemplary tetramer, both with (C) and without (D) localization assignment. Apparently, without localization assignment there is no realistic chance to identify any structural organization of the oligomer. To facilitate the analysis, one may include prior knowledge e.g. by assuming the oligomer to be represented by a regular polygonic structure. In this case, all corners of the simulated tetramer would lie on the perimeter of the circumscribed circle. However, even under this assumption, the circular fit does not yield satisfactory results (dashed line in Fig 1D); in this particular case, the size of the tetramer is substantially overestimated. Localization assignment substantially improves the situation (C). In this case, all localizations assigned to single dye molecules can be averaged, indicated by colored circles in Fig 1C). Taking these averaged positions as input for the fit yields the circle indicated by the dashed line, which is fairly close to the ground truth (dotted line). Importantly, a circle fit shows an inherent bias towards larger sizes [19] (see Methods Eqs (12) & (34)). This is intuitively plausible, as on average more data points lie outside the circle and hence contribute with a higher statistical weight. Correcting for the bias with Eq (39) yields an improved fit result that is shown by the solid line in Fig 1C.
In the following, we provide a quantitative evaluation of the proposed method; specifically, we assess the estimation of oligomer side length from a large number of recorded identical oligomers. We assume here that the oligomers shall be sufficiently separated from each other, so that a standard 2D clustering algorithm can be applied in order to group localizations belonging to individual oligomers. Such clustering algorithms can be found e.g. in refs. [20, 21]. As first step, we group the localizations of each oligomer based on the obtained intensities Nx and Ny. As an eligibility criterion, all oligomers which yield n distinct groups of localizations are taken for further analysis, all others are neglected. This criterion particularly rejects scenarios, where two or more groups of localizations overlap and hence would be interpreted as one spurious position at the weighted average of the detected localizations. If not mentioned otherwise, we assumed full labelling of all protomers. In Fig 2A we analyzed the assignment process (gray) and the eligibility criterion (black) for different single molecule brightness levels Nmax. With increasing brightness, we observed an increasing percentage of oligomers for which all localizations were assigned correctly to the individual protomers. The reason for this is the reduced spread of the brightness clusters in the Nx—Ny representation, which improves the performance of the applied clustering algorithm. Along a similar line, also the fraction of eligible oligomers increases with Nmax, partly due to improved assignment, partly due to the reduced influence of the detection threshold (S2 Fig in S1 File). We also analyzed the fraction of eligible oligomers which contained incorrectly assigned localizations, yielding negligible contributions (dashed line in Fig 2A).


Statistics for a set of oligomers.
(A) Assignment of localizations to individual dye molecules. The maximum number of photons Nmax emitted from a fluorophore was varied from 10000 to 100000. The gray line indicates the percentage of tetramers, for which all localizations were assigned to the correct dye molecule. A tetramer was considered eligible for further analysis, if its localizations were assigned into four groups. The black line shows the percentage of eligible tetramers. For each data point a data set of 500000 tetramers (side length 5 nm) was simulated. (B) Histogram of estimated tetramer side length for a data set of 500000 tetramers with nominal side length of 5 nm. Localizations were assigned to individual dye molecules. A total number of 367328 tetramers were eligible and further analyzed. For circle fitting, term (10) was minimized. For side length estimation we used Eq (39). Analysis of the histogram yields a mean of 5.1959 nm (blue line) and a median of 5.0256 nm (green line). Values larger than 9 nm were cut off for display only.
Secondly, for each group of localizations we calculate their mean position, which are used to fit the circle that minimizes Eq (10); the fits are performed for all oligomers separately. From the fit results we determine the corrected radii using Eq (39) in order to calculate the n-mer side lengths via
We next estimated the influence of the number of simulated oligomers available for the analysis. The results for both a maximum photon number of Nmax = 104 and Nmax = 105 photons are shown in S3A Fig in S1 File. Here and in subsequent plots, we quantified errors by calculating

We further analyzed the dependence of εl on the obtained photon numbers by varying the maximal number of photons Nmax from 104 to 105 (Fig 3A). Note that in this figure, we show a symmetric logarithmic plot, which shows positive and negative relative errors on the positive and negative y-axis, respectively. Relative errors |εl| < 10−3 are shown on a linear scale. The median generally gives very precise results with relative errors below 5 ‰, corresponding to 0.025 nm. For large photon numbers, Nmax ≥ 2 ⋅ 104, the side length is slightly underestimated (indicated by red color). Again, the mean estimator performs less well (full symbols). Of note, the average localization error for single molecules Δx would yield 2.30 and 0.78 nm for Nmax = 104 and Nmax = 105, respectively.


Influence of signal brightness and noise.
Relative error εl for estimation of tetramer side length upon variation of the maximum photon number Nmax (A) and background noise (B) shown in a symmetric logarithmic plot. For each data point 500000 tetramers (side length 5 nm) were simulated. In both panels we compared the analysis via the mean (full symbols) and median (open symbols). Positive and negative relative errors represent overestimation and underestimation, respectively. In panel A we assumed zero background noise b, in panel B we considered Nmax = 104 photons (gray lines) and Nmax = 105 (black lines). Error bars indicate the 95% confidence intervals.
Up to now, we did not consider background noise for the analysis. A real life experiment, however, inevitably contains contributions from camera noise and sample background noise. The main consequences of including noise in the analysis are increased localization errors. We investigated the influence of background noise on the side length estimation by increasing its standard deviation up to b = 300 photon counts, which would be an exceptionally high value for cellular background (Fig 3B and S4 Fig in S1 File). Background noise mainly impacted the results for low photon numbers, where its relative contribution is higher. For high photon numbers, background only had a slight effect on the results.
An important issue with any fluorescence labeling technique is labeling efficiency, leaving some of the protomers within an oligomeric structure undetectable. Experimentally, this may be due to incomplete maturation of fluorescent proteins, prebleaching of dye molecules, or incomplete conjugation of the dye to the protomer. Generally, incomplete labeling compromises registration methods. However, in cases where the template is known and assignment of localizations to individual protomers is possible, one may filter the data and use only oligomers with correct number of dyes n for analysis. To asses the effects of incomplete labeling on our method, we varied the effective labeling efficiency and quantified the eligibility of oligomers. As expected, reduced labeling efficiency massively reduces the number of eligible oligomers (Fig 4A). Importantly, however, the labeling efficiency does not have a large influence on the side length estimation (Fig 4B), only the standard error of the median increases with decreasing labeling efficiency due to the reduced number of eligible oligomers (S5 Fig in S1 File).


Influence of labeling efficiency.
For each data point 500000 tetramers (side length 5 nm) were simulated, assuming Nmax = 104 photons (gray line) or Nmax = 105 (black line). (A) Percentage of eligible tetramers. (B) Relative error εl shown in a symmetric logarithmic plot. Positive and negative relative errors represent overestimation and underestimation, respectively. Error bars indicate the 95% confidence intervals.
We next were interested in the performance of our method for extremely small oligomers. When varying the side length between 10 and 1 nm, we made an interesting observation: While relative errors εl were negligible for side length l ≥ 5 nm, errors increased strongly at short side lengths, yielding an overestimation of the oligomer size up to a factor of 2 (Fig 5). Relative errors εl were negligible for side length l ≥ 2 nm and l ≥ 5 nm for Nmax = 105 and 104 photons, respectively (Fig 5). Errors increased strongly, however, at shorter side lengths, yielding an overestimation of the oligomer size up to a factor of 2, likely reflecting increasingly unstable fit results in case of high single molecule localization errors.


Influence of oligomer side length.
Relative error εl for estimation of tetramer side length upon variation of the nominal side length l shown in a symmetric logarithmic plot. Results are shown both for a maximum photon number Nmax = 104 (gray) and Nmax = 105 (black). Positive and negative relative errors represent overestimation and underestimation, respectively. For each data point a data set of 500000 tetramers was simulated. Error bars indicate the 95% confidence intervals.
Further, we investigated the performance of our method for tri-, tetra-, penta- and hexamers, i.e. oligomers consisting of n = 3, 4, 5, 6 protomers (Fig 6). All oligomers were simulated as regular polygons. For this, we set the radius of each oligomer type to the fixed value of 4 nm. This leads to different side lengths for each oligomer type. The resulting relative error εl of the fitting procedure is shown in (Fig 6B). For both Nmax = 104 photons and Nmax = 105 photons, we observed improved performance with increasing degree of oligomerization. The main reason for this is an increased number of localizations for higher n. The number of eligible oligomers is somewhat reduced for increasing n due to increased ambiguities in the localization assignments, and a higher likelihood for missing one of the corners (S6 Fig in S1 File). Importantly, for virtually all simulations we observed very small errors εl ≪ 10−2.


Influence of oligomerization degree.
Relative error εl for varying degree of oligomerization (n = 3, 4, 5, 6) shown in a symmetric logarithmic plot. Data was simulated both for a maximum number of photons Nmax = 104 (gray) and Nmax = 105 (black). Positive and negative relative errors represent overestimation and underestimation, respectively. For each data point 500000 oligomers were simulated. The radius of the circumscribed circle of the oligomer was set to 4 nm. The resulting side length for each oligomer type is indicated on the x-axis. Error bars indicate the 95% confidence intervals.
In a realistic scenario, it may be difficult to ensure coplanarity between the plane of focus and the plane of orientation of the oligomeric structure. We were hence interested to what extent a tilt of tetramers out of the focal plane influences the results. Fig 7 shows that up to 10 degrees tilt the relative errors stay below 1%. Surprisingly, even massive tilts of 40 degrees only lead to a 10% underestimation of the obtained tetramer size.


Influence of a tilt between the focal plane and the oligomerization plane.
Relative error εl for varying tilt of the oligomerization plane shown in a symmetric logarithmic plot. Data was simulated both for a maximum number of photons Nmax = 104 (gray) and Nmax = 105 (black). Overestimation of oligomer sizes is indicated by positive, underestimation by negative relative errors. For each data point 500000 oligomers were simulated. The radius of the circumscribed circle of the oligomer was set to 4 nm. Error bars indicate the 95% confidence intervals.
Finally, we were interested in the performance of our method with respect to runtime (S7 Fig in S1 File). For this, we compared the analysis of different numbers of tetramers on a standard personal computer (see Methods). As input we used localization maps, which were already assigned to individual oligomers. Analysis of 500000 tetramers, as used throughout this manuscript, takes approximately three minutes. As expected, the runtime scales linearly with the number of tetramers, which can become a massive advantage for the analysis of large data-sets compared to template-free methods.
In this manuscript, we describe a workflow for the quantitative analysis of regular oligomeric structures based on single molecule localization microscopy data, that were obtained with polarization-sensitive cryo-fluorescence microscopy. Performing experiments at cryogenic temperatures has a strong advantage over room-temperature measurements, as it solves the fixation problem. Standard fixation methods using chemical fixatives often do not preserve the ultrastructure of the sample [4], and even show residual mobility of biomolecules [3]. The problem becomes massive when SMLM shall be applied to questions from structural biology, where structure sizes down to a few nanometers shall be resolved. In contrast, cryo-fixation is considered as the gold standard and hardly affects the ultrastructure of the sample even below nanometer length scales [5].
Measuring at cryogenic temperatures further offers the possibility to exploit polarization effects due to the fixation of the fluorophore’s transition dipole. This allows to assign localizations to the individual dye molecules via their characteristic brightness upon excitation with differently polarized light. On top of that, also differences in the local environment of each fluorophore may additionally accentuate the recorded brightness values, thereby further improving discrimination. In principle, this enables the identification of partially labeled oligomers, which hence can be rejected from the analysis. A further advantage of cryogenic measurements is reduced photobleaching kinetics. In practice, one may hence expect even more precise estimates of oligomer sizes due to a higher number of localizations recorded per molecule.
A few requirements need to be fulfilled in order to fully capitalize on the strength of the method:
The fluorophores should be located in the focal plane. Due to the fixed orientation of the dipoles, the corresponding PSF will generally be tilted against the optical axis. Even slight defocusing may hence substantially displace the obtained localizations from the true fluorophore position [22]. Azimuthal filtering [23] or polarization-resolved imaging [24] have been described as solutions to obviate this effect.
The number of protomers per oligomer should be known. This requirement ensures that only correct, i.e. fully labeled, oligomers are taken for the analysis. While for a substantial number of proteins the degree of oligomerization is known, many interesting cases lack information on the oligomerization. In principle, this information can be extracted from the SMLM data by taking the maximum of the number of localization clusters per oligomer.
The labeling efficiency should be close to one. The higher the labeling efficiency, the larger the fraction of eligible oligomers (see Fig 4A). In consequence, less experiments would be required to achieve the same quality of the results. Importantly, lower labeling efficiencies do not introduce a bias in the obtained oligomer size, as contributions from incompletely labeled molecules would be rejected before the analysis (see Fig 4B).
The individual chromophores on each oligomer need to be mutually independent. Dyes in close proximity of a few nanometers may well exhibit coupling between their singlet and triplet levels [25], thereby affecting each other’s blinking rates. In the worst case, point-spread functions between different dyes would overlap. As long as reverse intersystem crossing processes are sufficiently slow, however, they can be filtered out in the respective single molecule trajectories based on abrupt jumps in the single molecule orientation. Of note, such effects were not observed in the pioneering study by Weisenburger et al. [8].
The oligomeric structure should be a regular polygon. While the analysis of irregular polygonal structures is in principle feasible, such a treatment goes beyond the scope of this manuscript.
The population should be homogeneous. Heterogeneous sample compositions are a challenging scenario for any particle averaging approach. In case of heterogeneities in the degree of oligomerization, our approach would yield the size of the oligomers of highest degree present in the sample.
The mutual distance between different oligomers should be large. To avoid localizations overlapping between different oligomers, it is critical to ensure that the mutual distance d of two neighboring oligomers is much larger than the spread of localizations belonging to one oligomer, i.e. d ≫ R + Δx. This can be achieved by reducing expression levels and/or by increasing the localization precision.
Oligomerization should occur only in a plane perpendicular to the optical axis. Two scenarios may be discriminated: First, the oligomerization plane may be tilted against the optical axis. An example would be oligomerization of proteins within the plasma membrane, which is not perfectly flat. In consequence, the obtained structures are distorted (Fig 7). Up to 40 degrees tilt angles, our approach yields oligomer sizes with surprisingly high precision. In more extreme cases, however, one may revert to alternative strategies. For example, a straightforward solution would be the rejection of oligomers, which show localization maps deviating from a regular polygon. Slightly distorted structures could still be accounted for by including a deformation matrix in the model [14]. Secondly, biomolecules may also oligomerize in three dimensions. In this case, tomographic approaches [8] may be preferential.
A straightforward application of our method would be the study of the protomer arrangement within oligomeric structures. Quite often it is not clear in which orientation protomers are assembled or how particular domains of the protein are arranged. If site-specifically labeled protomers are available, the resulting side length would depend on the position of the label: Labels facing towards the inside of the oligomeric structure would yield smaller side lengths than labels facing the outside of the oligomer. Positioning labels on specific sites of the protein hence allows for unravelling the protomer orientation. Similar approaches proved to be successful for the analysis of larger structures such as nuclear pore complexes [15] or endocytic sites [26].
In order to fully exploit the potential of our method it is critical to choose the labeling strategy wisely: Labels should be sufficiently small to report on the actual position of the target site on the protein, and exactly one dye molecule should be linked to the target site. These constraints disqualify fluorescently labeled antibodies. Appropriate possibilities include small tags [17] and unnatural amino acids [18]. In principle, also switchable fluorescent proteins can be used for the analysis of oligomeric structures which are large compared to the size of the fluorescent protein.
Taken together, we have presented and quantitatively characterized a method for polarization-sensitive cryo-SMLM. We found remarkable precision for the determination of the side length of regular oligomeric structures with relative errors of less than 1%, which would be of sufficient quality to ascribe subunit positions in multi-protein complexes. We believe that our method provides a good basis for opening up structural biology applications to cryo-SMLM approaches.
First, we simulated the positions of the protomers. For this, n protomers were assigned to each n-mer (n = 3, 4, 5, 6). Individual protomers belonging to one oligomer were arranged around the oligomer’s center position in the shape of a regular polygon with fixed side length, but random in-plane orientation. If not specified otherwise, we simulated
Second, each protomer was assumed to be labeled with exactly one dye molecule. In order to account for recordings at cryogenic conditions, a random but fixed dipole orientation was assigned to each dye molecule. The inherent brightness Nmax was considered to be the same for all dye molecules.
To simulate blinking, we assigned a random number of detections to each dye molecule, which was drawn from an artificial blinking statistics following a log-normal distribution (as in [27]). The mean of the log-normal distribution was set to 6.4 localizations and the standard deviation to 5 localizations. These values correspond to previously reported blinking characteristics of fluorescent probes under realistic experimental conditions (compare [28]).
Fluorophores were simulated to be excited alternatingly with differently polarized excitation light. The coordinate system was aligned with the orthogonal polarization directions x and y, which are orthogonal to the optical axis z. The absorption probability of a fluorophore depends on the angle between its dipole orientation and the polarization of the excitation light. Hence, w.l.o.g. the effective number of photons Nx, Ny for the two polarizations of excitation light can be calculated as



The error in intensity estimation was distributed according to a normal distribution with mean 0 and variance (ΔN)2. The variance (ΔN)2 was set to the best possible variance of an unbiased estimator, which corresponds to the Cramér-Rao lower bound (CRLB) and is given as follows [29]:

Determination of the single molecule positions was assumed to be performed based on the combined images acquired by excitation with differently polarized light. The total intensity was calculated as Ntotal = Nx + Ny. The uncertainty of the localization procedure is hence given as [29]:

In order to simulate tilted tetramers, without loss of generality we assumed a tilt around the x-axis. To this end, we transformed the y-coordinates of the single molecule positions according to y′ = y ⋅ cos(α), where α denotes the tilt angle of the oligomerization plane with respect to the focal plane.
In this mathematical part, we will use the following notation. We assume that all oligomers are equilateral polygons and have the same number of corners n. We will need to distinguish between the different dye molecules constituting an oligomer, which we will index by i ∈ {1, ⋯, n}, and different localizations corresponding to dye molecule i, which we will index by j ∈ {1, ⋯, mi}, where mi specifies the total number of localizations of dye molecule i. The position of the individual dye molecule will be denoted by

Throughout the manuscript we make use of the following nomenclature: R, L denote the ground truth radius and side length, respectively, of a regular polygon of n corners, which are related via L = 2R sin(π/n). For each oligomer i with given dipole orientations of the dye molecules, the variables
First of all, we assume that the individual oligomers as well as the corresponding measured localizations are well separated from the ones of each other oligomer. That is, measurements of different oligomers do not overlap. If that is the case, we are able to cluster the given data spatially in order to identify the localizations belonging to individual oligomers. This can be done effectively with standard two dimensional clustering techniques. We use a straightforward approach. We sort the data and take the differences in coordinates in order to identify the adjacent blinks which are closer than a certain prescribed distance from one another. Every such localizations are then grouped together to one cluster. In the simulations, however, we know which protomer belongs to which oligomer such that we omit this step and use the given information in order to avoid unlikely errors in this regard.
This task is performed by taking advantage of the measured polarization of the dipole. In general, the spatial variance of the distribution of blinks makes a reliable clustering (using only the spatial data) impossible. However, the polarization property discriminates effectively between all molecules in one oligomer, provided the polarization of each protomer is sufficiently far from the one of each other. It has to be noticed that we do not have access to the whole polarization of these molecules, but only to their projection on the illumination plane. Similarly, a sign change in the polarization cannot be detected.
Concretely, we cluster the estimated intensity of the two polarization directions, which is again a clustering in 2D as done above in the spatial domain (if performed). We consider the oligomer to be well resolved if the collection of blinks corresponding to that oligomer can be clustered in n groups, where the distance between two groups has to be larger than a given parameter δ. Empirical tests suggest δp = 300 + Nmax/100 to be a feasible choice for difference in the number of photons in order to assert the localizations properly. In the case the cluster is not well resolved for the polarization, we simply discard it for our further computations. Otherwise, we call the oligomer eligible and proceed to estimate the distance between its individual protomers.
See also Fig 1 for visualization, where one specific example (tetramer) is shown and the corresponding blinks are assigned to their respective protomer. As we can see, a simple spatial clustering of the blinks is not applicable.
Given the data of one individual oligomer, i.e., the localizations (blinks) contained in one spatial cluster, we are now interested in the distance l between the individual protomers. Since we assume the oligomeric structure to be a regular polygon, the distance between two adjacent protomers is supposed to be constant. That is, assuming that the corners are ordered, for all i, we have




Although the standard (affine fit) least square problem is strictly convex and has a unique solution, this one is not convex and can have several minima which might correspond to unrealistic solutions [30]. Moreover, this procedure will always yield an estimation for the radius of the circle which possesses a certain bias and the radius R of the circle (see [19, 31, 32]). For example, in Fig 2 one finds the histogram of the distribution of the estimated side length for individual tetramers of 5 nm side length. As we can see, the mean of the distribution is higher than the ground truth and the median, although also overestimated, is closer to the real value.
Suppose the localizations of the blinks are identically and independently distributed (iid) normal random variables with zero expectation (centered) and constant variance σ2. Depending on these parameters, we define the random variable r which describes the radius of the circle fitting those blinks (by minimizing (10)) and we denote by














To improve the estimation of the bias, we use the second order Taylor expansion of (10) for our parameters Θ = (a, b, R), which we write for the sake of brevity










An exact solution of Eq (34), considering





In order to solve (10), we have to provide an initial guess (a0, b0) for the center as well as r0 for the radius. While the minimization is not too sensitive to the guessed radius, the initial coordinates for the center ought to be not too far away from the ground truth. For that purpose, we use the center of mass as an initial guess for the center point. If we would not assign the different blinks to their individual protomers, that is the situation of M iid measurements, we simply compute the overall center of mass of all blinks in that particular spatial cluster. In our notation, that means with











Since we can estimate the localization precision (standard deviation in nm) from the measured intensity for each individual blink of one protomer, we are able to discard low quality data. Hence, one could ignore measurements whose localization precision is lower than certain threshold in order to improve results.
All error bars were calculated based on 1000 bootstrap samples, which were drawn from the individual data sets, and represent the 95% confidence intervals of the mean (or median).
For analysis of the runtime shown in S7 Fig in S1 File we used a standard personal computer model XPS 15 9570 with an Intel Core i7-8750H processor.
We thank Hamidreza Heydarian and Bernd Rieger for assistance with the code to generate S1 Fig in S1 File.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34