Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2
Highlight box
Key findings
• The included antigen-detecting rapid diagnostic tests (Ag-RDT) for detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are not suited for general screening. However, they may be clinically useful in high-prevalence settings.
• Manufacturer-independent evaluation of point-of-care (POC) laboratory tests among intended users is advisable to ensure adequate performance and user-friendliness.
What is known and what is new?
• The Ag-RDTs exhibit poorer diagnostic performance compared to the gold standard for the detection of SARS-CoV-2. However, they are more affordable and available than the higher-performing tests. Consequently, health institutions, etc., have been and may be tempted to implement Ag-RDTs during outbreaks of COVID-19 or other diseases.
• The study indicates that manufacturer-independent evaluation of Ag-RDTs demonstrates significantly lower performance compared to the stated performance by the manufacturers. Therefore, manufacturer-independent evaluation has an important role in the implementation of such tests.
What is the implication, and what should change now?
• For future implementation of Ag-RDTs and other POC laboratory tests, manufacturer-independent evaluations are advisable to ensure the quality.
Introduction
The coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), demonstrated the need for reliable, affordable and decentralized testing to help decrease the spread of the virus and to ease the pressure on medical laboratories (1). The initial testing regime for detection of SARS-CoV-2, and still the most commonly used method in medical laboratories, is nucleic acid amplification tests (NAATs), e.g., reverse transcription polymerase chain reaction (RT-PCR) (2). This current gold standard detection method is, however, both time and cost-consuming. In addition, the NAATs had, at the time of the pandemic, due to their dependence on reagent production (3). The development of antigen-detecting rapid diagnostic tests (Ag-RDTs) by mid-2020 made testing more accessible (1). The main disadvantages of the Ag-RDTs are their variable and generally poorer diagnostic performance compared to NAATs (4). In many clinical situations, however, their affordability, availability and ease of use may still render the best performing Ag-RDTs both adequate and useful.
The World Health Organization (WHO) recommends that the Ag-RDTs meet the minimum diagnostic performance criteria of ≥80% sensitivity and ≥97% specificity, and that they should mainly be used in symptomatic populations (2). Since the sensitivities and specificities declared by manufacturers may have been obtained under optimal laboratory conditions, evaluations performed among the intended users before implementation are recommended by the European Centre for Disease Prevention and Control (ECDC) (5).
The Scandinavian evaluation of laboratory equipment for point of care testing (SKUP) is a collaboration between the external quality assurance organisations DEKS in Denmark (6), Equalis in Sweden (7), and Noklus in Norway (8). The purpose of SKUP is to improve the quality of point-of-care (POC) testing by providing manufacturer-independent information about diagnostic performance and user-friendliness of POC laboratory equipment (9). SKUP evaluations are initiated by a manufacturer or supplier, who covers the direct costs of the evaluation. During the pandemic, SKUP evaluated five Ag-RDTs for COVID-19. The individual evaluation reports can be found at the SKUP homepage. The present study presents an overview of these five SKUP evaluations. All the evaluated Ag-RDTs have at some point been available on the Scandinavian market, either as a COVID-19 test or as a combo test with several target antigens.
The world has never seen such a testing regime as the one present under the COVID-19 pandemic. The aim of the present study was to describe and compare the diagnostic performance and user-friendliness of five prospectively evaluated Ag-RDTs by SKUP, under real-life conditions by the intended users of the test. We present this article in accordance with the STARD reporting checklist (available at https://jlpm.amegroups.org/article/view/10.21037/jlpm-24-56/rc).
Methods
Enrolment of participants and study procedure
All the SKUP evaluations included in this study were set in COVID-19 test centres in Scandinavia. This setting made consecutive enrolment of participants possible. During the first four evaluations, the inclusion criteria for participation were “symptomatic and asymptomatic subjects exposed to individuals who had previously tested positive for SARS-CoV-2”. At the time of the fifth evaluation, the inclusion criteria were changed to “symptomatic and asymptomatic subjects with high probability of SARS-CoV-2 infection”. This was considered necessary because of the high prevalence of infected participants, due to the Omicron wave in Norway at the time. All subjects who attended a SARS-CoV-2 test appointment at the test centre were considered to have a high probability of infection. In all evaluations, subjects under the age of 16 and subjects who did not understand the local language (Norwegian or Danish, depending on the evaluation site) were not allowed to participate. Participation in the SKUP evaluation was voluntary, and verbal informed consent was considered sufficient. All test-centre employees involved in the evaluations had training in the use of the respective Ag-RDT. The intended sample size in each evaluation was set to at least 100 participants with positive and 100 participants with negative RT-PCR results. The intended sample size was based on a combination of recommendations from the European Union (EU) Health Security Committee and making the execution, recruitment and sample collection possible (10). Symptomatic participants with negative RT-PCR results were not tested for other pathogens.
The consecutive samples for both the Ag-RTDs and the reference standard were collected at the same time by the same person. The sampling material varied between evaluations (Table 1). Samples were analysed immediately with the Ag-RDT, in accordance with the instructions from the manufacturers, until a result was obtained. Samples for the reference standard were placed into sterile tubes containing 2–3 mL of viral transport media (VTM) and kept at room temperature until transport to a clinical laboratory for analysis on the reference standard. Internal analytical quality control (IQC) material was not available for all of the Ag-RDTs (Table 1). Where possible, IQC was analysed each day of the evaluation, and upon opening of new test kits. IQC was performed according to instructions of the manufacturer of the individual Ag-RDT.
Table 1
| Evaluation | LumiraDx SARS-CoV-2 Ag Test | CLINITEST Rapid COVID-19 Antigen Test | NADAL® COVID-19 Ag Test | Flowflex SARS-CoV-2 Antigen Rapid Test | MF-68 SARS-CoV-2 Antigen Test |
|---|---|---|---|---|---|
| Country | Norway | Norway | Norway | Denmark | Norway |
| Evaluation period | October–December, 2020 | March–June, 2021 | December 2020–September 2021 | March 2021–February 2022 | February–March 2022 |
| Ag-RDT, sample type | N, NP† | NP | N, NP‡ | N | N |
| IQC for Ag-RDT | Included | Included§ | Not included | Included | Not included |
| RT-PCR, sample type | NP | NP, OP | NP | OP | OP |
| Laboratory for reference standard | Department of Microbiology, Haukeland University Hospital, Bergen | Department of Microbiology, Oslo University Hospital, Oslo | Fürst Medical Laboratory, Oslo | Clinical Diagnostic Department, Hospital of South West Jutland, Esbjerg/Department of Clinical Biochemistry, Bispebjerg Hospital, Copenhagen | Department of Microbiology, Haukeland University Hospital, Bergen |
| Reference standard | Lightcycler 480 (Roche)/Quantstudio (Applied Biosystem). Mastermix: QuantiNova® Pathogen + IC Kit (Qiagen) | TecanFluent 1080 (Tecan Trading AG)/EZ1 (Qiagen)/AriaDX (Agilent Technologies Inc.) with Magnetic nanoparticles (Norwegian University of Science and Technology), primers (TIB Molibol Syntheselabor GmbH), Invitrogen Superscript III, RT/Platinum Taq mix (Thermo Fisher Scientific Inc.), E-gene-Probe (Integrated DNA Technologies Inc.) | 7500 SDS/Quantstudio 5 (Applied biosystems), RIDAGENE SARS-CoV-2 Realtime PCR kit | Biorad Thermocycler CFX (BIO-Rad Laboratories Inc.), Allplex 2019 nCoV assay (Seegene Inc.)/CoviDetect FAST assay (PentaBase A/S) | Lightcycler 480 (Roche)/Quantstudio (Applied biosystem). Mastermix: QuantiNova® Pathogen + IC Kit (Qiagen) |
†, duplicate sampling. ‡, change from nasopharyngeal to nasal sampling during the evaluation. §, IQC included towards the end of the evaluation. Ag, antigen; Ag-RDT, antigen-detecting rapid diagnostic tests; COVID-19, coronavirus disease 2019; IQC, internal analytical quality control; N, nasal; NP, nasopharyngeal; OP, oropharyngeal; RT-PCR, reverse transcription polymerase chain reaction; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Reference standard
Routine RT-PCR methods for detection of SARS-CoV-2 in the respective clinical microbiology laboratories were used as a reference standard (Table 1). The results from the Ag-RDTs were not available for the laboratory. Inconclusive RT-PCR results were omitted. In Norway, all laboratories involved in the evaluations used RT-PCR detection of the E-gene of the Sarbeco Betacoronavirus, including SARS-CoV-2. In Denmark, the laboratory at Esbjerg hospital used RT-PCR detection of the E-gene of the Sarbeco Betacoronavirus, including SARS-CoV-2, the RNA-dependent RNA polymerase (RdRP) gene and nucleocapsid (N) protein gene specific for SARS-CoV-2, while the laboratory at Bispebjerg hospital used RT-PCR detection of the E-gene and RdRP gene of SARS-CoV-2. All included Norwegian laboratories were accredited according to Norsk Standard-Europeisk Norm International Organization for Standardization/International Electrotechnical Commission (NS-EN ISO/IEC) 15189 [2012]. The included Danish laboratories were accredited according to Danish Standard (DS)/EN ISO 15189 [2013] by the Danish Accreditation Fund (DANAK). All involved laboratories participated in at least one external quality assessment (EQA) scheme for RT-PCR detection of SARS-CoV-2.
SKUP evaluations included in the study
The five Ag-RDTs evaluated by SKUP were: LumiraDx SARS-CoV-2 Ag Test (LumiraDx UK Ltd., Lumira, Alloa, UK) (SKUP/2021/124), CLINITEST Rapid COVID-19 Antigen Test (Healgen Scientific LC, Houston, USA) (SKUP/2021/127), NADAL® COVID-19 Ag Test (Nal von Minden GmbH, Moers, Germany) (SKUP/2022/125), Flowflex SARS-CoV-2 Antigen Rapid Test (Acon Biotech Co. Ltd., Hangzhou, China) (SKUP/2022/128) and MF-68 SARS-CoV-2 Antigen Test (Shenzhen Microprofit Biotech Co., Ltd., Shenzhen, China) (SKUP/2022/131) (9). The Ag-RDTs are referred to as Lumira, CLINITEST, NADAL, Flowflex and MF-68. The prospective evaluations were carried out in four COVID-19 test centres in Norway and one at a COVID-19 test centre in Denmark (Table 1). The evaluations of Lumira and MF-68 were performed in Bergen, Norway, in the autumn of 2020 and spring of 2022, respectively. The evaluations of CLINITEST and NADAL were performed in Oslo, Norway in the spring/summer of 2021 and from late 2020 to autumn of 2021, respectively. The evaluation of Flowflex was performed in Esbjerg, Denmark from spring 2021 to early 2022. The timing of the evaluations was based on when requests were made from the requesting companies. Lumira is an instrument that uses a rapid microfluidic immunofluorescence assay for the detection of the nucleocapsid protein antigen of SARS-CoV-2. The other four are lateral flow immunoassay (LFA) for the detection of the nucleocapsid protein antigen of SARS-CoV-2. All Ag-RDTs included in this study were intended to be used by healthcare professionals.
The laboratories involved in the evaluation of CLINITEST, Flowflex and MF-68 participated in a scheme by Quality Control for Molecular Diagnostics (QCMD), while the laboratory involved in the evaluation of NADAL participated in a scheme by INSTAND e.V. The laboratory involved in the evaluation of Lumira participated in both mentioned EQA schemes. All EQA results for the reference standards were in accordance with the assigned value (positive/negative) with exception of one positive control from INSTAND under the evaluation of NADAL.
User-friendliness
For evaluation of user-friendliness, a questionnaire with four categories adapted to the intended users was used: (I) operational facilities (ease of use), including the ease of preparing the test, preparing the sample, applying the sample, and ensuring correct specimen volume, as well as the instrument/test strip design, reading of the test result, sources of error, cleaning/maintenance, hygiene when using the test, and size and weight of the package; (II) information in the manual and quick guide, including table of contents/index, and descriptions of preparations, specimen collection, measurement procedure, and how to read the result, as well as description of the sources of error, help for troubleshooting, readability/clarity of presentation and general impression; (III) time factors, including required training time, duration of preparations, duration of analysis, stability of test opened/unopened package and stability of quality control material in opened/unopened packages; (IV) analytical quality control, including reading of the internal quality control, usefulness of the internal quality control material and possibility of participating in EQA. Each item had three possible ratings: satisfactory, intermediate and unsatisfactory, in addition to a “no opinion” option. Out of the four categories, the first two were mainly evaluated by the intended users, while the latter two were evaluated by SKUP. The final rating of user-friendliness was based on an overall assessment by SKUP. Consequently, a low score on a single item could, in some cases, lead to a low overall rating if that particular aspect was considered essential to the user-friendliness of the test. SKUP has evaluated the user-friendliness of POC testing since the 1990s, and the user-friendliness questionnaire has been optimized and adapted for different types of POC systems throughout the years.
Statistical analyses
Statistical analyses were performed with Microsoft Excel. The analyses were performed in an Excel template developed by SKUP. Raw data were entered manually, sample identifications (IDs) were marked as either true positive, false positive, false negative, or true negative, summarized, and through cell formulas, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and prevalence were calculated. Subgroup performance and comparison between subgroups (performance in symptomatic vs. asymptomatic) were also calculated, using the same procedure. Predictive values at hypothetical prevalences, in test populations of 10,000, were calculated with point estimates of sensitivities and specificities obtained in the actual evaluations. Due to the risk of errors in Excel, all raw data and all formulas were proofread by two authors. Estimation of 95% confidence intervals (CIs) was calculated using an adjusted Wald method (11). A chi-square test was used to evaluate differences in proportions (12), where a P<0.05 was considered statistically significant.
Diagnostic performance was evaluated by comparing the point estimates of sensitivity and specificity to the WHO recommendation of ≥80% sensitivity and ≥97% specificity (2). For user-friendliness, the quality goal of an overall rating of “satisfactory” was used.
Ethical considerations
The evaluations were considered method evaluations and thus exempt from ethical board review in Norway and Denmark. All participants were 16 years or older and consented to participation at inclusion. As of the Norwegian Health Research Act Section 17 and the Danish Health Care Act Section 17, the age of medical consent is 16 and 15 years, respectively (13,14). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. No ethical approval was needed as the study was considered a quality assurance project. Verbal informed consent was given from all included participants.
Results
Population characteristics and sampling
In total, 448, 666, 679, 564 and 321 participants provided samples for both the Ag-RDT method and for the reference standard in the evaluations of Lumira, CLINITEST, NADAL, Flowflex and MF-68, respectively (Table 2). If a result from either the Ag-RDT or the reference standard was missing, the result was excluded (n=37) (Table 3). The proportion of symptomatic participants varied from 33% for CLINITEST to 90% for the MF-68. The most frequently reported symptom was sore throat (Table 2).
Table 2
| Population characteristics | LumiraDx SARS-CoV-2 Ag Test | CLINITEST Rapid COVID-19 Antigen Test | NADAL® COVID-19 Ag Test | Flowflex SARS-CoV-2 Antigen Rapid Test | MF-68 SARS-CoV-2 Antigen Test |
|---|---|---|---|---|---|
| Number of participants, n | 448 | 666 | 679 | 564 | 321 |
| Age (years), median [range] | 29 [16–89] | 19 [16–75]† | 35 [16–77] | 43 [16–90] | 32 [16–86] |
| Symptomatic, n [%] | 251 [56] | 217 [33] | 308 [45] | 302 [54] | 288 [90] |
| Reported symptoms, n [%] | |||||
| Sore throat | 158 [63] | 134 [62] | 179 [58] | 103 [34] | 226 [78] |
| Dry cough | 82 [33] | 87 [40] | 121 [39] | 74 [25] | 168 [58] |
| Headache | 101 [40] | 61 [28] | 128 [42] | 75 [25] | 61 [21] |
| Fever | 73 [29] | 40 [18] | 72 [23] | 45 [15] | 108 [38] |
| Muscle aches | 30 [12] | 25 [12] | 51 [17] | 42 [14] | 110 [38] |
| Stomach problems | 13 [5] | 2 [1] | 1 [0] | 6 [2] | 35 [12] |
| Symptom duration, n [%] | |||||
| 0–1 day | 52 [21] | 55 [25] | 47 [15] | 64 [21] | 109 [38] |
| 2–5 days | 167 [66] | 93 [43] | 101 [33] | 86 [28] | 125 [43] |
| >5 days | 9 [4] | 21 [10] | 13 [4] | 23 [8] | 33 [12] |
| Unknown | 23 [9] | 48 [22] | 147 [48] | 129 [43] | 21 [7] |
†, age unknown for 18 participants. COVID-19, coronavirus disease 2019; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Table 3
| Stratified results | LumiraDx SARS-CoV-2 Ag Test (N) | Lumira Dx SARS-CoV-2 Ag Test (NP) | CLINITEST Rapid COVID-19 Antigen Test | NADAL® COVID-19 Ag Test | Flowflex SARS-CoV-2 Antigen Rapid Test | MF-68 SARS-CoV-2 Antigen Test |
|---|---|---|---|---|---|---|
| Prevalence, % | 19 | 19 | 11 | 11 | 21 | 66 |
| True positive results, n | 72 | 74 | 39 | 58 | 91 | 148 |
| False negative results, n | 11 | 8 | 34 | 20 | 30 | 63 |
| True negative results, n | 362 | 351 | 589 | 599 | 441 | 108 |
| False positive results, n | 2 | 8 | 4 | 2 | 2 | 2 |
| Missing results, n | 3 | 9 | 6 | 11 | 3 | 5 |
| Diagnostic performance overall, % [95% CI] | ||||||
| Sensitivity | 87 [78–93] | 90 [82–95] | 53 [42–64] | 74 [64–83] | 75 [67–82] | 70 [64–76] |
| Specificity | 99.5 [97.8–99.9] | 97.8 [95.6–98.9] | 99.3 [98.2–99.8] | 99.7 [98.7–99.9] | 99.6 [98.3–99.9] | 98.2 [93.2–99.9] |
| Diagnostic performance among symptomatic, % [95% CI] | ||||||
| Sensitivity | 89 [79–95] | 92 [83–96] | 58 [44–71] | 77 [66–86] | 79 [69–86] | 71 [64–77] |
| Specificity | 99.4 [96.6–99.9] | 97.2 [93.3–99.0] | 97.6 [93.9–99.3] | 99.2 [96.8–99.9] | 99.5 [97.1–99.9] | 97.7 [91.5–99.9] |
| Diagnostic sensitivity, % [95% CI] | ||||||
| Ct <25 | 100 [95–100] | 98 [90–100] | 83 [64–94] | 84 [73–91] | 83 [74–89] † | 73 [64–81] |
| Diagnostic sensitivity symptomatic vs. asymptomatic, % [95% CI] | ||||||
| Asymptomatic | 73 [43–91] | 80 [48–96] | 44 [27–63] | 50 [22–78] | 65 [47–79] | 60 [31–83] |
| Symptomatic | 89 [79–95] | 92 [83–96] | 58 [44–71] | 77 [66–86] | 79 [69–86] | 71 [64–77] |
| P value for difference | 0.140915 | 0.243983 | 0.244017 | 0.095795 | 0.10998 | 0.472725 |
| Diagnostic sensitivity in relation to duration of symptoms, % [95% CI] | ||||||
| 0–1 day | 85 [57–97] | 92 [65–100] | 67 [39–87] | 80 [48–95] | 76 [55–90] | 68 [57–78] |
| 2–5 days | 95 [84–99] | 93 [81–98] | 64 [39–84] | 78 [58–91] | 83 [65–93] | 79 [69–86] |
| >5 days | ‡ | ‡ | ‡ | ‡ | 78 [44–95] | 57 [37–76] |
| Unknown onset | 73 [43–91] | 91 [60–100] | 44 [25–66] | 77 [61–88] | 77 [60–89] | 53 [30–70] |
†, Ct-values stratified on results from E-gene. Results for RdRP-gene also available (not shown). ‡ n<8; not reported due to high degree of uncertainty in the estimated sensitivity. CI, confidence interval; Ct, cycle threshold; N, nasal; NP, nasopharyngeal; RdRP, RNA-dependent RNA polymerase; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Performance of the Ag-RDTs
The lowest prevalence of SARS-CoV-2 infection among participants was observed during the CLINITEST and NADAL evaluations at 11% each, a higher prevalence of 19% and 21% was observed during the evaluations of Lumira and Flowflex, while the highest prevalence, 66%, was observed during the MF-68 evaluation (Table 3). The only Ag-RDT that met the recommended WHO criteria for diagnostic sensitivity (≥80%) overall was Lumira, with a sensitivity of 90% (95% CI: 82–95%) for nasopharyngeal sampling. Lumira had 87% (95% CI: 78–93%) sensitivity with nasal sampling, and the same participants provided both the nasal and the nasopharyngeal samples. NADAL and Flowflex had sensitivities of 74% (95% CI: 64–83%) and 75% (95% CI: 67–82%), respectively. CLINITEST and MF-68 had sensitivities of 53% (95% CI: 42–64%) and 70% (95% CI: 64–76%), respectively. All the Ag-RDTs in the evaluations met or most likely met the recommended WHO criteria for diagnostic specificity (≥97%) (Table 3).
Overall, the point estimates of sensitivity were higher in symptomatic than asymptomatic populations (Table 3), although the differences were not statistically significant. In the two later evaluations of Flowflex and MF-68, the tests tended to show a higher sensitivity 2–5 days after onset of symptoms compared to 0–1 day after onset (Table 3). In the first three evaluations, no difference in sensitivity was observed with regard to time since onset of symptoms. CLINITEST had the lowest overall diagnostic sensitivity, but the performance of CLINITEST was similar to the others for samples with cycle threshold (Ct) values <25 (Table 3). MF-68 had the lowest sensitivity for samples with Ct values <25.
Predictive values
To facilitate comparison between the Ag-RDTs, the PPVs and NPVs were calculated at three hypothetical prevalences (0.5%, 10% and 20%) (Figures 1,2). At the lowest prevalence, the PPVs were low and varied between 16% and 55%, while the NPVs were very similar and close to 100%. At 20% prevalence, the PPVs varied between 90% and 99%, while the NPVs varied between 89% and 98%.
IQC
For the Ag-RDTs with available IQCs (Lumira, CLINITEST and Flowflex), all results were in accordance with the assigned value (positive/negative, data not shown).
User-friendliness of the Ag-RDTs
In each evaluation, a user-friendliness questionnaire was filled in by 4–6 intended users. For all the Ag-RDTs, user-friendliness was rated as satisfactory overall (Table 4). Some of the Ag-RDTs had intermediate and unsatisfactory ratings in some subcategories, especially for the operational facilities and package insert sections. For CLINITEST and NADAL, most of the lower ratings concerned a time-consuming analytical step with extraction buffer, and for the NADAL test, this step was later removed by the manufacturer. The SARS-CoV-2 rapid test from Lumira requires the LumiraDx instrument, and the instrument functionality led to several of the intermediate ratings in the operation facilities section due to error messages, especially in the start-up procedure.
Table 4
| Ratings | Lumira Dx SARS-CoV-2 Ag Testa | CLINITEST Rapid COVID-19 Antigen Testb | NADAL® COVID-19 Ag Testc | Flowflex SARS-CoV-2 Antigen Rapid Testd | MF-68 SARS-CoV-2 Antigen Teste |
|---|---|---|---|---|---|
| Total rating | Satisfactory | Satisfactory | Satisfactory | Satisfactory | Satisfactory |
| Evaluators, n | 4 | 4 | 4 | 6 | 6 |
| Operation facilities (14 subcategories) | Satisfactory (I5) | Satisfactory (I7, U1, N2) | Satisfactory (I3, N1) | Satisfactory (I3, U1) | Satisfactory (I3) |
| Package insert (11 subcategories) | Satisfactory (I2) | Satisfactory | Satisfactory (N2) | Satisfactory (I2) | Satisfactory (I1, N1) |
| Time factors (7 subcategories) | Satisfactory (I3) | Satisfactory | Satisfactory† | Satisfactory | Satisfactory† (U1) |
| Analytical quality control (3 subcategories) | Satisfactory | Satisfactory | Satisfactory† | Satisfactory | Satisfactory† |
Comments from evaluators, i.e., intended users: a—operation facilities; intermediate ratings concerned error messages during start-up, difficulties regarding application of analytical quality control due to air bubbles in the pipette, difficulties with squeezing sample out of sample tube, noise from the instrument, and error messages in general, which led to loss of test strips. Package insert; intermediate ratings concerned limited information about error codes. Time factors; intermediate ratings concerned stability, at the time of evaluation, the cassette had 6 months stability, as of today this is no problem as the stability have been extended to 2 years; b—operation facilities; all intermediate and unsatisfactory ratings concerned either that it was one minute incubation of sample in extraction buffer before analysis or dissatisfaction with the nozzle/cap of the sample tube; c—operation facilities; intermediate ratings concerned the design of nasal swab and a two-minute incubation of the sample in the extraction buffer before analysis. As of today, there are no two-minute step in the extraction buffer; d—operation facilities; intermediate and unsatisfactory ratings concerned sample collection method, where the intended users preferred oropharyngeal sampling, generation of foam when dispensing drops to the test cassette and difficulties in reading the result; e—operation facilities; intermediate ratings concerned how the test kit was packed. Package insert; intermediate rating concerned difficulties in understanding professional language. Time factors; unsatisfactory rating concerned stability of opened quality control material. Rating scale: satisfactory (S), intermediate (I), unsatisfactory (U), and no opinion (N). In, Un, Nn: number of subcategories with intermediate rating. †, internal quality control not included in test kit. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Discussion
Over the course of the COVID-19 pandemic, SKUP evaluated the diagnostic performance and user-friendliness of five Ag-RDTs for the detection of SARS-CoV-2 in the hands of the intended users. The diagnostic sensitivities varied from 53% (95% CI: 42–64%) to 90% (95% CI: 82–95%). The diagnostic specificities ranged from 97.8% (95% CI: 95.6–98.9%) to 99.7% (95% CI: 98.7–99.9%). All the Ag-RDTs in the study obtained an overall rating of satisfactory user-friendliness.
All included Ag-RDTs showed a lower diagnostic sensitivity in the SKUP evaluations compared to the sensitivity reported by the manufacturer in the instruction for use (IFU, data from manufacturer not shown). The diagnostic performance of the Ag-RDTs found in the SKUP evaluations were similar to diagnostic performance reported in the meta-analysis in a review article published in Cochrane Library (4), though other studies have found an even higher variation in the diagnostic sensitivity, such as a study from Denmark by Schneider et al. (15). These results show the importance of manufacturer-independent evaluations or studies before implementation of new tests. The meta-analysis from Cochrane (4) and the Danish study (15) have assessed two of the SKUP evaluated Ag-RDTs: Lumira and CLINITEST (4,15). Lumira met, or most likely met, the WHO criteria for sensitivity in all three studies, while CLINITEST did not meet the sensitivity criteria in any. However, Flowflex most likely did not meet the sensitivity criteria in the SKUP evaluation, while having the highest assessed sensitivity (94%) in the Danish study. The observed difference in the sensitivity of Flowflex could be attributed to the different methodologies used. While SKUP’s evaluation was prospective with consecutive sampling, the Danish study combined inviting pre-tested subjects with positive RT-PCR samples and a retrospective part with pooled samples. The diagnostic specificities of the Ag-RDTs found by SKUP were similar to the Cochrane reported results and the Danish study (4,15).
A low Ct-value in an RT-PCR result is indicative of high viral load in the sample; thus, Ag-RDTs usually perform better on samples with Ct-values in the lower range (4,16,17). However, Ct-values are method-dependent and can therefore vary between laboratories and are thus not directly comparable (18). The overall point estimates for diagnostic sensitivities for the Ag-RDTs in the SKUP evaluation were higher among symptomatic than asymptomatic participants, which could indicate a correlation between viral load, symptoms, and diagnostic sensitivity (19).
Throughout the course of the evaluations, we observed a change in the relationship between time of symptom onset and probability of a true positive test result. During the first three evaluations, the observed sensitivities were similar for 0–1 day after symptom onset and 2–5 days after onset. During the final two evaluations, the point estimates of the diagnostic sensitivities of the Ag-RDTs tended to be higher 2–5 days compared to 0–1 day after symptom onset, though not statistically significant. The differences between the variants of SARS-CoV-2 and the evolving immunity in the population may explain this observation (20). Our findings indicate that postponing testing until a few days after onset of symptoms, may decrease the chance of false negative results.
The five SKUP evaluations included one instrument, Lumira, and four LFAs, all for rapid antigen detection. For diagnostic specificity, no significant difference was observed between the instrument and the LFAs. Lumira, however, showed higher diagnostic sensitivity compared to the LFAs. The LFAs are based on subjective interpretation when reading the results, while the Lumira instrument uses an automatic reading of the results, which could partially explain the difference in performance.
In low-prevalence screening situations, a high PPV is important as well as a high diagnostic specificity to reduce incidences of false-positive results. In high-prevalence settings, however, a high NPV as well as a high diagnostic sensitivity may be considered more important to limit the amount of false negative results. All five Ag-RTDs evaluated had a calculated low PPV in a low-prevalence setting and are therefore not considered suitable for screening asymptomatic populations. However, in a high-prevalence setting, and especially in a symptomatic population, the Ag-RTDs may be considered sufficiently accurate for clinical use. Figures 1,2 demonstrate the possible noteworthy difference in performance based on the prevalence.
Among the five evaluated Ag-RDTs, the instrument (Lumira) was the test with the widest range of causes for intermediate ratings among the evaluators. An instrument is naturally more sensitive to technical errors compared to LFAs. In some cases, the error messages addressed insufficient sample volume, which caused waste of test strips. The errors were not regarded as critical enough to decrease the overall rating by SKUP. The manufacturer reported that some of the recurring technical errors were eliminated in instruments produced from October 2020 onwards (not tested by SKUP).
Some institutions recommend repeated testing, especially in high-prevalence settings (21). The FDA recommendation concerning repeated testing is to repeat the test once, at least 48 hours after the first test. Independent of the diagnostic performance, repeated testing will increase the probability of detecting an infection (21). However, repeated testing may also increase the probability of false-positive results. If individuals with present symptoms indicative of infection get a negative test result, repeated testing is often recommended (1).
Strengths and weaknesses
The major strength of this study is the inclusion of five comprehensive Ag-RDT evaluations, performed in accordance with the same SKUP protocol. Evaluations under real-life conditions, with consecutive patients, give more relevant results than laboratory evaluations under optimal conditions. The evaluation of user-friendliness reported by intended users is also important information that is often not included in other studies/evaluations. Major weaknesses of the study include that the evaluations were performed at different times, in different populations, with different prevalences, different viral variants and with different reference standards (Tables 2,3). These factors make comparison and ranking the tests difficult. Additionally, many participants did not have their symptoms reported, as requested by SKUP; thus, this information could not be fully explored.
Conclusions
The evaluated Ag-RDTs are not suitable for screening in low-prevalence settings, but may be clinically useful in high-prevalence settings, especially among symptomatic populations. Manufacturer-independent evaluation of POC laboratory tests among intended users is advisable to ensure adequate documentation of performance and user-friendliness.
Acknowledgments
We acknowledge all evaluation sites and employees including the laboratories. Our sincere thank you to all participants that participated in the SKUP evaluations.
Footnote
Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/rc
Data Sharing Statement: Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/dss
Peer Review File: Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/coif). D.K.K. was received the evaluations were partially funded by the manufacturers and suppliers (LumiraDx UK Ltd., Healgen Scientific LC, Nal von Minden GmbH, Acon Biotech Co. Ltd., and Shenzhen Microprofit Biotech Co., Ltd.), who covered the testing costs and received impartial evaluations. DEKS, Noklus, and Equalis funded the evaluation-related work, such as planning, protocol writing, assistance during practical work at evaluation sites, data processing, and report writing. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. No ethical approval was needed as the study was considered a quality assurance project. Verbal informed consent was given from all included participants.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Peeling RW, Heymann DL, Teo YY, et al. Diagnostics for COVID-19: moving from pandemic response to control. Lancet 2022;399:757-68. [Crossref] [PubMed]
- WHO. Antigen-detection in the diagnosis of SARS-CoV-2 infection. 2021. Available online: https://www.who.int/publications/i/item/antigen-detection-in-the-diagnosis-of-sars-cov-2infection-using-rapid-immunoassays
- Cerutti F, Burdino E, Milia MG, et al. Urgent need of rapid tests for SARS CoV-2 antigen detection: Evaluation of the SD-Biosensor antigen test for SARS-CoV-2. J Clin Virol 2020;132:104654. [Crossref] [PubMed]
- Dinnes J, Sharma P, Berhane S, et al. Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection. Cochrane Database Syst Rev 2022;7:CD013705. [Crossref] [PubMed]
- European Centre for Disease Prevention and Control. Options for the Use of Rapid Antigen Tests for COVID-19 in the EU/EEA and the UK. Technical Report. 2020. Available online: https://www.ecdc.europa.eu/sites/default/files/documents/Options-use-of-rapid-antigen-tests-for-COVID-19_0.pdf
. Available online: https://deks.dk/en/front-pageDEKS . Available online: https://www.equalis.se/sv/EQUALIS - Stavelin A, Sandberg S. Essential aspects of external quality assurance for point-of-care testing. Biochem Med (Zagreb) 2017;27:81-5. [Crossref] [PubMed]
. Available online: www.skup.orgSKUP - Commitee EHS. EU Common list of COVID-19 antigen tests. 2021. Available online: https://www.wellion.eu/fileadmin/user_upload/PDF/Ratgeber_und_Downloads/covid-19_rat_common-list_en_21Dec2021.pdf
. Available online: https://measuringu.com/calculators/wald/Adjusted Wald - Statistics SS. Social Science Statistics. Available online: https://www.socscistatistics.com/.
- Lov om medisinsk og helsefaglig forskning (helseforskningsloven). Available online: https://lovdata.no/dokument/NL/lov/2008-06-20-44
- Sundhedsloven. Available online: https://danskelove.dk/sundhedsloven/17
- Schneider UV, Forsberg MW, Leineweber TD, et al. A nationwide analytical and clinical evaluation of 44 rapid antigen tests for SARS-CoV-2 compared to RT-qPCR. J Clin Virol 2022;153:105214. [Crossref] [PubMed]
- Brümmer LE, Katzenschlager S, Gaeddert M, et al. Accuracy of novel antigen rapid diagnostics for SARS-CoV-2: A living systematic review and meta-analysis. PLoS Med 2021;18:e1003735. [Crossref] [PubMed]
- Wölfl-Duchek M, Bergmann F, Jorda A, et al. Sensitivity and Specificity of SARS-CoV-2 Rapid Antigen Detection Tests Using Oral, Anterior Nasal, and Nasopharyngeal Swabs: a Diagnostic Accuracy Study. Microbiol Spectr 2022;10:e0202921. [Crossref] [PubMed]
- Buchta C, Görzer I, Chiba P, et al. Variability of cycle threshold values in an external quality assessment scheme for detection of the SARS-CoV-2 virus genome by RT-PCR. Clin Chem Lab Med 2021;59:987-94. [Crossref] [PubMed]
- Tsukagoshi H, Shinoda D, Saito M, et al. Relationships between Viral Load and the Clinical Course of COVID-19. Viruses 2021;13:304. [Crossref] [PubMed]
- Meiners L, Horn J, Jones TC, et al. SARS-CoV-2 rapid antigen test sensitivity and viral load in newly symptomatic hospital employees in Berlin, Germany, December, 2020 to February, 2022: an observational study. Lancet Microbe 2024;5:e538-e546. [Crossref] [PubMed]
- FDA. At-Home OTC COVID-19 Diagnostic Tests. FDA; 2025. Available online: https://www.fda.gov/medical-devices/coronavirus-covid-19-and-medical-devices/home-otc-covid-19-diagnostic-tests
Cite this article as: Hekland J, Morken C, Eriksson Boija E, Kristian Kur D, Sandberg S, Tollånes MC. Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2. J Lab Precis Med 2025;10:16.

