Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2

Joakim Hekland; Christine Morken; Elisabet Eriksson Boija; Dår Kristian Kur; Sverre Sandberg; Mette C. Tollånes

doi:10.21037/jlpm-24-56

Original Article

Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2

Joakim Hekland¹, Christine Morken¹, Elisabet Eriksson Boija², Dår Kristian Kur³, Sverre Sandberg¹, Mette C. Tollånes^1,4

¹Norwegian Organization of Quality Improvement of Laboratory Examinations (Noklus), Bergen, Norway; ²Equalis AB, Uppsala, Sweden; ³Danish Institute for External Quality Assurance for Laboratories in Health Care (DEKS), Glostrup, Denmark; ⁴Department of Global public health and primary care, University of Bergen, Bergen, Norway

Contributions: (I) Conception and design: MC Tollånes, J Hekland; (II) Administrative support: J Hekland, C Morken, MC Tollånes; (III) Provision of study materials or patients: J Hekland, C Morken, D Kristian Kur, E Eriksson Boija; (IV) Collection and assembly of data: J Hekland, C Morken, D Kristian Kur, E Eriksson Boija; (V) Data analysis and interpretation: J Hekland, C Morken, MC Tollånes; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Joakim Hekland, SBS. Norwegian Organization of Quality Improvement of Laboratory Examinations (Noklus), Haraldsplass Diakonale Sykehus, Ulriksdal 8c, 5009 Bergen, Norway. Email: joakim.hekland@noklus.no.

Background: During the coronavirus disease 2019 (COVID-19) pandemic, initially all testing was by reverse transcription polymerase chain reaction (RT-PCR) in centralized laboratory facilities, involving specialised laboratory personnel, high costs and the risk of reagent shortages. Thus, the need for decentralized, easily accessible and affordable testing was urgent. During 2020–2022, Scandinavian evaluation of laboratory equipment for point of care testing (SKUP) performed five evaluations of antigen-detecting rapid diagnostic tests (Ag-RDT) for detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The aim of the present study was to describe and compare the diagnostic performance and user-friendliness of five prospectively evaluated Ag-RDTs by SKUP, under real-life conditions by the intended users of the tests.

Methods: This study included SKUP evaluations of Ag-RDTs; LumiraDx SARS-CoV-2 Ag Test (LumiraDx UK Ltd.) (SKUP/2021/124), CLINITEST Rapid COVID-19 Antigen Test (Healgen Scientific LC) (SKUP/2021/127), NADAL^® COVID-19 Ag Test (Nal von Minden GmbH) (SKUP/2022/125), Flowflex SARS-CoV-2 Antigen Rapid Test (Acon Biotech Co., Ltd.) (SKUP/2022/128) and MF-68 SARS-CoV-2 Antigen Test (Shenzhen Microprofit Biotech Co., Ltd.) (SKUP/2022/131). Duplicate sampling from participants with suspected infection was performed, one for the Ag-RDT and one for RT-PCR (reference standard). Consecutive sampling of participants was performed by employees in four COVID-19 test centres in Norway and one in Denmark during the evaluation periods. The intended sample size was at least 100 positive and negative, or a maximum of 500 samples. Initially, the following inclusion criteria were used: symptomatic and asymptomatic subjects exposed to individuals who had previously tested positive for SARS-CoV-2. In the last evaluation, inclusion criteria were “symptomatic and asymptomatic subjects with high probability of SARS-CoV-2 infection”, due to the high-prevalence of infections at this time. Statistical analyses were performed with Microsoft Excel. Sensitivity, specificity, and positive and negative predictive values, including 95% confidence intervals (CIs), were calculated. User-friendliness was evaluated by a questionnaire filled out by the employees in the COVID-19 test centres.

Results: The number of participants in the evaluations ranged from 321 to 679. The diagnostic sensitivities ranged from 53% (95% CI: 42–64%) to 90% (95% CI: 82–95%), and the diagnostic specificities from 97.8% (95% CI: 95.6–98.9%) to 99.7% (95% CI: 98.7–99.9%). Three of the Ag-RDTs had sensitivities between 70% and 75%. The tests generally performed better in symptomatic populations compared to asymptomatic populations, but the differences were not statistically significant. At a hypothetical prevalence of 0.5%, positive predictive values (PPVs) were low (16–55%). At 20% prevalence, the PPVs varied between 90% and 99%. All tests obtained an overall satisfactory rating for user-friendliness.

Conclusions: The evaluated Ag-RDTs were not suitable for screening in low-prevalence settings. In general, the tests did not perform as well as reported by manufacturers. Manufacturer-independent evaluations of point-of-care tests among intended users are advisable to ensure adequate documentation of performance and user-friendliness.

Keywords: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); manufacturer-independent evaluation; user-friendliness; intended users; antigen-detecting rapid diagnostic tests (Ag-RDT)

Received: 13 December 2024; Accepted: 22 April 2025; Published online: 29 July 2025.

doi: 10.21037/jlpm-24-56

Highlight box

Key findings

• The included antigen-detecting rapid diagnostic tests (Ag-RDT) for detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are not suited for general screening. However, they may be clinically useful in high-prevalence settings.

• Manufacturer-independent evaluation of point-of-care (POC) laboratory tests among intended users is advisable to ensure adequate performance and user-friendliness.

What is known and what is new?

• The Ag-RDTs exhibit poorer diagnostic performance compared to the gold standard for the detection of SARS-CoV-2. However, they are more affordable and available than the higher-performing tests. Consequently, health institutions, etc., have been and may be tempted to implement Ag-RDTs during outbreaks of COVID-19 or other diseases.

• The study indicates that manufacturer-independent evaluation of Ag-RDTs demonstrates significantly lower performance compared to the stated performance by the manufacturers. Therefore, manufacturer-independent evaluation has an important role in the implementation of such tests.

What is the implication, and what should change now?

• For future implementation of Ag-RDTs and other POC laboratory tests, manufacturer-independent evaluations are advisable to ensure the quality.

Introduction

The coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), demonstrated the need for reliable, affordable and decentralized testing to help decrease the spread of the virus and to ease the pressure on medical laboratories (1). The initial testing regime for detection of SARS-CoV-2, and still the most commonly used method in medical laboratories, is nucleic acid amplification tests (NAATs), e.g., reverse transcription polymerase chain reaction (RT-PCR) (2). This current gold standard detection method is, however, both time and cost-consuming. In addition, the NAATs had, at the time of the pandemic, due to their dependence on reagent production (3). The development of antigen-detecting rapid diagnostic tests (Ag-RDTs) by mid-2020 made testing more accessible (1). The main disadvantages of the Ag-RDTs are their variable and generally poorer diagnostic performance compared to NAATs (4). In many clinical situations, however, their affordability, availability and ease of use may still render the best performing Ag-RDTs both adequate and useful.

The World Health Organization (WHO) recommends that the Ag-RDTs meet the minimum diagnostic performance criteria of ≥80% sensitivity and ≥97% specificity, and that they should mainly be used in symptomatic populations (2). Since the sensitivities and specificities declared by manufacturers may have been obtained under optimal laboratory conditions, evaluations performed among the intended users before implementation are recommended by the European Centre for Disease Prevention and Control (ECDC) (5).

The Scandinavian evaluation of laboratory equipment for point of care testing (SKUP) is a collaboration between the external quality assurance organisations DEKS in Denmark (6), Equalis in Sweden (7), and Noklus in Norway (8). The purpose of SKUP is to improve the quality of point-of-care (POC) testing by providing manufacturer-independent information about diagnostic performance and user-friendliness of POC laboratory equipment (9). SKUP evaluations are initiated by a manufacturer or supplier, who covers the direct costs of the evaluation. During the pandemic, SKUP evaluated five Ag-RDTs for COVID-19. The individual evaluation reports can be found at the SKUP homepage. The present study presents an overview of these five SKUP evaluations. All the evaluated Ag-RDTs have at some point been available on the Scandinavian market, either as a COVID-19 test or as a combo test with several target antigens.

The world has never seen such a testing regime as the one present under the COVID-19 pandemic. The aim of the present study was to describe and compare the diagnostic performance and user-friendliness of five prospectively evaluated Ag-RDTs by SKUP, under real-life conditions by the intended users of the test. We present this article in accordance with the STARD reporting checklist (available at https://jlpm.amegroups.org/article/view/10.21037/jlpm-24-56/rc).

Methods

Enrolment of participants and study procedure

All the SKUP evaluations included in this study were set in COVID-19 test centres in Scandinavia. This setting made consecutive enrolment of participants possible. During the first four evaluations, the inclusion criteria for participation were “symptomatic and asymptomatic subjects exposed to individuals who had previously tested positive for SARS-CoV-2”. At the time of the fifth evaluation, the inclusion criteria were changed to “symptomatic and asymptomatic subjects with high probability of SARS-CoV-2 infection”. This was considered necessary because of the high prevalence of infected participants, due to the Omicron wave in Norway at the time. All subjects who attended a SARS-CoV-2 test appointment at the test centre were considered to have a high probability of infection. In all evaluations, subjects under the age of 16 and subjects who did not understand the local language (Norwegian or Danish, depending on the evaluation site) were not allowed to participate. Participation in the SKUP evaluation was voluntary, and verbal informed consent was considered sufficient. All test-centre employees involved in the evaluations had training in the use of the respective Ag-RDT. The intended sample size in each evaluation was set to at least 100 participants with positive and 100 participants with negative RT-PCR results. The intended sample size was based on a combination of recommendations from the European Union (EU) Health Security Committee and making the execution, recruitment and sample collection possible (10). Symptomatic participants with negative RT-PCR results were not tested for other pathogens.

The consecutive samples for both the Ag-RTDs and the reference standard were collected at the same time by the same person. The sampling material varied between evaluations (Table 1). Samples were analysed immediately with the Ag-RDT, in accordance with the instructions from the manufacturers, until a result was obtained. Samples for the reference standard were placed into sterile tubes containing 2–3 mL of viral transport media (VTM) and kept at room temperature until transport to a clinical laboratory for analysis on the reference standard. Internal analytical quality control (IQC) material was not available for all of the Ag-RDTs (Table 1). Where possible, IQC was analysed each day of the evaluation, and upon opening of new test kits. IQC was performed according to instructions of the manufacturer of the individual Ag-RDT.

Table 1

Evaluations and methods

Evaluation	LumiraDx SARS-CoV-2 Ag Test	CLINITEST Rapid COVID-19 Antigen Test	NADAL^® COVID-19 Ag Test	Flowflex SARS-CoV-2 Antigen Rapid Test	MF-68 SARS-CoV-2 Antigen Test
Country	Norway	Norway	Norway	Denmark	Norway
Evaluation period	October–December, 2020	March–June, 2021	December 2020–September 2021	March 2021–February 2022	February–March 2022
Ag-RDT, sample type	N, NP^†	NP	N, NP^‡	N	N
IQC for Ag-RDT	Included	Included^§	Not included	Included	Not included
RT-PCR, sample type	NP	NP, OP	NP	OP	OP
Laboratory for reference standard	Department of Microbiology, Haukeland University Hospital, Bergen	Department of Microbiology, Oslo University Hospital, Oslo	Fürst Medical Laboratory, Oslo	Clinical Diagnostic Department, Hospital of South West Jutland, Esbjerg/Department of Clinical Biochemistry, Bispebjerg Hospital, Copenhagen	Department of Microbiology, Haukeland University Hospital, Bergen
Reference standard	Lightcycler 480 (Roche)/Quantstudio (Applied Biosystem). Mastermix: QuantiNova^® Pathogen + IC Kit (Qiagen)	TecanFluent 1080 (Tecan Trading AG)/EZ1 (Qiagen)/AriaDX (Agilent Technologies Inc.) with Magnetic nanoparticles (Norwegian University of Science and Technology), primers (TIB Molibol Syntheselabor GmbH), Invitrogen Superscript III, RT/Platinum Taq mix (Thermo Fisher Scientific Inc.), E-gene-Probe (Integrated DNA Technologies Inc.)	7500 SDS/Quantstudio 5 (Applied biosystems), RIDAGENE SARS-CoV-2 Realtime PCR kit	Biorad Thermocycler CFX (BIO-Rad Laboratories Inc.), Allplex 2019 nCoV assay (Seegene Inc.)/CoviDetect FAST assay (PentaBase A/S)	Lightcycler 480 (Roche)/Quantstudio (Applied biosystem). Mastermix: QuantiNova^® Pathogen + IC Kit (Qiagen)

^†, duplicate sampling. ^‡, change from nasopharyngeal to nasal sampling during the evaluation.^§, IQC included towards the end of the evaluation. Ag, antigen; Ag-RDT, antigen-detecting rapid diagnostic tests; COVID-19, coronavirus disease 2019; IQC, internal analytical quality control; N, nasal; NP, nasopharyngeal; OP, oropharyngeal; RT-PCR, reverse transcription polymerase chain reaction; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Reference standard

Routine RT-PCR methods for detection of SARS-CoV-2 in the respective clinical microbiology laboratories were used as a reference standard (Table 1). The results from the Ag-RDTs were not available for the laboratory. Inconclusive RT-PCR results were omitted. In Norway, all laboratories involved in the evaluations used RT-PCR detection of the E-gene of the Sarbeco Betacoronavirus, including SARS-CoV-2. In Denmark, the laboratory at Esbjerg hospital used RT-PCR detection of the E-gene of the Sarbeco Betacoronavirus, including SARS-CoV-2, the RNA-dependent RNA polymerase (RdRP) gene and nucleocapsid (N) protein gene specific for SARS-CoV-2, while the laboratory at Bispebjerg hospital used RT-PCR detection of the E-gene and RdRP gene of SARS-CoV-2. All included Norwegian laboratories were accredited according to Norsk Standard-Europeisk Norm International Organization for Standardization/International Electrotechnical Commission (NS-EN ISO/IEC) 15189 [2012]. The included Danish laboratories were accredited according to Danish Standard (DS)/EN ISO 15189 [2013] by the Danish Accreditation Fund (DANAK). All involved laboratories participated in at least one external quality assessment (EQA) scheme for RT-PCR detection of SARS-CoV-2.

SKUP evaluations included in the study

The five Ag-RDTs evaluated by SKUP were: LumiraDx SARS-CoV-2 Ag Test (LumiraDx UK Ltd., Lumira, Alloa, UK) (SKUP/2021/124), CLINITEST Rapid COVID-19 Antigen Test (Healgen Scientific LC, Houston, USA) (SKUP/2021/127), NADAL^® COVID-19 Ag Test (Nal von Minden GmbH, Moers, Germany) (SKUP/2022/125), Flowflex SARS-CoV-2 Antigen Rapid Test (Acon Biotech Co. Ltd., Hangzhou, China) (SKUP/2022/128) and MF-68 SARS-CoV-2 Antigen Test (Shenzhen Microprofit Biotech Co., Ltd., Shenzhen, China) (SKUP/2022/131) (9). The Ag-RDTs are referred to as Lumira, CLINITEST, NADAL, Flowflex and MF-68. The prospective evaluations were carried out in four COVID-19 test centres in Norway and one at a COVID-19 test centre in Denmark (Table 1). The evaluations of Lumira and MF-68 were performed in Bergen, Norway, in the autumn of 2020 and spring of 2022, respectively. The evaluations of CLINITEST and NADAL were performed in Oslo, Norway in the spring/summer of 2021 and from late 2020 to autumn of 2021, respectively. The evaluation of Flowflex was performed in Esbjerg, Denmark from spring 2021 to early 2022. The timing of the evaluations was based on when requests were made from the requesting companies. Lumira is an instrument that uses a rapid microfluidic immunofluorescence assay for the detection of the nucleocapsid protein antigen of SARS-CoV-2. The other four are lateral flow immunoassay (LFA) for the detection of the nucleocapsid protein antigen of SARS-CoV-2. All Ag-RDTs included in this study were intended to be used by healthcare professionals.

The laboratories involved in the evaluation of CLINITEST, Flowflex and MF-68 participated in a scheme by Quality Control for Molecular Diagnostics (QCMD), while the laboratory involved in the evaluation of NADAL participated in a scheme by INSTAND e.V. The laboratory involved in the evaluation of Lumira participated in both mentioned EQA schemes. All EQA results for the reference standards were in accordance with the assigned value (positive/negative) with exception of one positive control from INSTAND under the evaluation of NADAL.

User-friendliness

For evaluation of user-friendliness, a questionnaire with four categories adapted to the intended users was used: (I) operational facilities (ease of use), including the ease of preparing the test, preparing the sample, applying the sample, and ensuring correct specimen volume, as well as the instrument/test strip design, reading of the test result, sources of error, cleaning/maintenance, hygiene when using the test, and size and weight of the package; (II) information in the manual and quick guide, including table of contents/index, and descriptions of preparations, specimen collection, measurement procedure, and how to read the result, as well as description of the sources of error, help for troubleshooting, readability/clarity of presentation and general impression; (III) time factors, including required training time, duration of preparations, duration of analysis, stability of test opened/unopened package and stability of quality control material in opened/unopened packages; (IV) analytical quality control, including reading of the internal quality control, usefulness of the internal quality control material and possibility of participating in EQA. Each item had three possible ratings: satisfactory, intermediate and unsatisfactory, in addition to a “no opinion” option. Out of the four categories, the first two were mainly evaluated by the intended users, while the latter two were evaluated by SKUP. The final rating of user-friendliness was based on an overall assessment by SKUP. Consequently, a low score on a single item could, in some cases, lead to a low overall rating if that particular aspect was considered essential to the user-friendliness of the test. SKUP has evaluated the user-friendliness of POC testing since the 1990s, and the user-friendliness questionnaire has been optimized and adapted for different types of POC systems throughout the years.

Statistical analyses

Statistical analyses were performed with Microsoft Excel. The analyses were performed in an Excel template developed by SKUP. Raw data were entered manually, sample identifications (IDs) were marked as either true positive, false positive, false negative, or true negative, summarized, and through cell formulas, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and prevalence were calculated. Subgroup performance and comparison between subgroups (performance in symptomatic vs. asymptomatic) were also calculated, using the same procedure. Predictive values at hypothetical prevalences, in test populations of 10,000, were calculated with point estimates of sensitivities and specificities obtained in the actual evaluations. Due to the risk of errors in Excel, all raw data and all formulas were proofread by two authors. Estimation of 95% confidence intervals (CIs) was calculated using an adjusted Wald method (11). A chi-square test was used to evaluate differences in proportions (12), where a P<0.05 was considered statistically significant.

Diagnostic performance was evaluated by comparing the point estimates of sensitivity and specificity to the WHO recommendation of ≥80% sensitivity and ≥97% specificity (2). For user-friendliness, the quality goal of an overall rating of “satisfactory” was used.

Ethical considerations

The evaluations were considered method evaluations and thus exempt from ethical board review in Norway and Denmark. All participants were 16 years or older and consented to participation at inclusion. As of the Norwegian Health Research Act Section 17 and the Danish Health Care Act Section 17, the age of medical consent is 16 and 15 years, respectively (13,14). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. No ethical approval was needed as the study was considered a quality assurance project. Verbal informed consent was given from all included participants.

Results

Population characteristics and sampling

In total, 448, 666, 679, 564 and 321 participants provided samples for both the Ag-RDT method and for the reference standard in the evaluations of Lumira, CLINITEST, NADAL, Flowflex and MF-68, respectively (Table 2). If a result from either the Ag-RDT or the reference standard was missing, the result was excluded (n=37) (Table 3). The proportion of symptomatic participants varied from 33% for CLINITEST to 90% for the MF-68. The most frequently reported symptom was sore throat (Table 2).

Table 2

Participants

Population characteristics	LumiraDx SARS-CoV-2 Ag Test	CLINITEST Rapid COVID-19 Antigen Test	NADAL^®COVID-19 Ag Test	Flowflex SARS-CoV-2 Antigen Rapid Test	MF-68 SARS-CoV-2 Antigen Test
Number of participants, n	448	666	679	564	321
Age (years), median [range]	29 [16–89]	19 [16–75]^†	35 [16–77]	43 [16–90]	32 [16–86]
Symptomatic, n [%]	251 [56]	217 [33]	308 [45]	302 [54]	288 [90]
Reported symptoms, n [%]
Sore throat	158 [63]	134 [62]	179 [58]	103 [34]	226 [78]
Dry cough	82 [33]	87 [40]	121 [39]	74 [25]	168 [58]
Headache	101 [40]	61 [28]	128 [42]	75 [25]	61 [21]
Fever	73 [29]	40 [18]	72 [23]	45 [15]	108 [38]
Muscle aches	30 [12]	25 [12]	51 [17]	42 [14]	110 [38]
Stomach problems	13 [5]	2 [1]	1 [0]	6 [2]	35 [12]
Symptom duration, n [%]
0–1 day	52 [21]	55 [25]	47 [15]	64 [21]	109 [38]
2–5 days	167 [66]	93 [43]	101 [33]	86 [28]	125 [43]
>5 days	9 [4]	21 [10]	13 [4]	23 [8]	33 [12]
Unknown	23 [9]	48 [22]	147 [48]	129 [43]	21 [7]

^†, age unknown for 18 participants. COVID-19, coronavirus disease 2019; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Table 3

Diagnostic performance

Stratified results	LumiraDx SARS-CoV-2 Ag Test (N)	Lumira Dx SARS-CoV-2 Ag Test (NP)	CLINITEST Rapid COVID-19 Antigen Test	NADAL^® COVID-19 Ag Test	Flowflex SARS-CoV-2 Antigen Rapid Test	MF-68 SARS-CoV-2 Antigen Test
Prevalence, %	19	19	11	11	21	66
True positive results, n	72	74	39	58	91	148
False negative results, n	11	8	34	20	30	63
True negative results, n	362	351	589	599	441	108
False positive results, n	2	8	4	2	2	2
Missing results, n	3	9	6	11	3	5
Diagnostic performance overall, % [95% CI]
Sensitivity	87 [78–93]	90 [82–95]	53 [42–64]	74 [64–83]	75 [67–82]	70 [64–76]
Specificity	99.5 [97.8–99.9]	97.8 [95.6–98.9]	99.3 [98.2–99.8]	99.7 [98.7–99.9]	99.6 [98.3–99.9]	98.2 [93.2–99.9]
Diagnostic performance among symptomatic, % [95% CI]
Sensitivity	89 [79–95]	92 [83–96]	58 [44–71]	77 [66–86]	79 [69–86]	71 [64–77]
Specificity	99.4 [96.6–99.9]	97.2 [93.3–99.0]	97.6 [93.9–99.3]	99.2 [96.8–99.9]	99.5 [97.1–99.9]	97.7 [91.5–99.9]
Diagnostic sensitivity, % [95% CI]
Ct <25	100 [95–100]	98 [90–100]	83 [64–94]	84 [73–91]	83 [74–89]^†	73 [64–81]
Diagnostic sensitivity symptomatic vs. asymptomatic, % [95% CI]
Asymptomatic	73 [43–91]	80 [48–96]	44 [27–63]	50 [22–78]	65 [47–79]	60 [31–83]
Symptomatic	89 [79–95]	92 [83–96]	58 [44–71]	77 [66–86]	79 [69–86]	71 [64–77]
P value for difference	0.140915	0.243983	0.244017	0.095795	0.10998	0.472725
Diagnostic sensitivity in relation to duration of symptoms, % [95% CI]
0–1 day	85 [57–97]	92 [65–100]	67 [39–87]	80 [48–95]	76 [55–90]	68 [57–78]
2–5 days	95 [84–99]	93 [81–98]	64 [39–84]	78 [58–91]	83 [65–93]	79 [69–86]
>5 days	^‡	^‡	^‡	^‡	78 [44–95]	57 [37–76]
Unknown onset	73 [43–91]	91 [60–100]	44 [25–66]	77 [61–88]	77 [60–89]	53 [30–70]

^†, Ct-values stratified on results from E-gene. Results for RdRP-gene also available (not shown).^‡ n<8; not reported due to high degree of uncertainty in the estimated sensitivity. CI, confidence interval; Ct, cycle threshold; N, nasal; NP, nasopharyngeal; RdRP, RNA-dependent RNA polymerase; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Performance of the Ag-RDTs

The lowest prevalence of SARS-CoV-2 infection among participants was observed during the CLINITEST and NADAL evaluations at 11% each, a higher prevalence of 19% and 21% was observed during the evaluations of Lumira and Flowflex, while the highest prevalence, 66%, was observed during the MF-68 evaluation (Table 3). The only Ag-RDT that met the recommended WHO criteria for diagnostic sensitivity (≥80%) overall was Lumira, with a sensitivity of 90% (95% CI: 82–95%) for nasopharyngeal sampling. Lumira had 87% (95% CI: 78–93%) sensitivity with nasal sampling, and the same participants provided both the nasal and the nasopharyngeal samples. NADAL and Flowflex had sensitivities of 74% (95% CI: 64–83%) and 75% (95% CI: 67–82%), respectively. CLINITEST and MF-68 had sensitivities of 53% (95% CI: 42–64%) and 70% (95% CI: 64–76%), respectively. All the Ag-RDTs in the evaluations met or most likely met the recommended WHO criteria for diagnostic specificity (≥97%) (Table 3).

Overall, the point estimates of sensitivity were higher in symptomatic than asymptomatic populations (Table 3), although the differences were not statistically significant. In the two later evaluations of Flowflex and MF-68, the tests tended to show a higher sensitivity 2–5 days after onset of symptoms compared to 0–1 day after onset (Table 3). In the first three evaluations, no difference in sensitivity was observed with regard to time since onset of symptoms. CLINITEST had the lowest overall diagnostic sensitivity, but the performance of CLINITEST was similar to the others for samples with cycle threshold (Ct) values <25 (Table 3). MF-68 had the lowest sensitivity for samples with Ct values <25.

Predictive values

To facilitate comparison between the Ag-RDTs, the PPVs and NPVs were calculated at three hypothetical prevalences (0.5%, 10% and 20%) (Figures 1,2). At the lowest prevalence, the PPVs were low and varied between 16% and 55%, while the NPVs were very similar and close to 100%. At 20% prevalence, the PPVs varied between 90% and 99%, while the NPVs varied between 89% and 98%.

Figure 1 Positive predictive values at three given prevalences. Raw data from the evaluation was used to calculate predictive value at three hypothetical prevalences. PPV, positive predictive value; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Figure 2 Negative predictive values at three given prevalences. Raw data from the evaluation was used to calculate predictive value at three hypothetical prevalences. NPV, negative predictive value; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

IQC

For the Ag-RDTs with available IQCs (Lumira, CLINITEST and Flowflex), all results were in accordance with the assigned value (positive/negative, data not shown).

User-friendliness of the Ag-RDTs

In each evaluation, a user-friendliness questionnaire was filled in by 4–6 intended users. For all the Ag-RDTs, user-friendliness was rated as satisfactory overall (Table 4). Some of the Ag-RDTs had intermediate and unsatisfactory ratings in some subcategories, especially for the operational facilities and package insert sections. For CLINITEST and NADAL, most of the lower ratings concerned a time-consuming analytical step with extraction buffer, and for the NADAL test, this step was later removed by the manufacturer. The SARS-CoV-2 rapid test from Lumira requires the LumiraDx instrument, and the instrument functionality led to several of the intermediate ratings in the operation facilities section due to error messages, especially in the start-up procedure.

Table 4

User-friendliness

Ratings	Lumira Dx SARS-CoV-2 Ag Test^a	CLINITEST Rapid COVID-19 Antigen Test^b	NADAL^® COVID-19 Ag Test^c	Flowflex SARS-CoV-2 Antigen Rapid Test^d	MF-68 SARS-CoV-2 Antigen Test^e
Total rating	Satisfactory	Satisfactory	Satisfactory	Satisfactory	Satisfactory
Evaluators, n	4	4	4	6	6
Operation facilities (14 subcategories)	Satisfactory (I⁵)	Satisfactory (I⁷, U¹, N²)	Satisfactory (I³, N¹)	Satisfactory (I³, U¹)	Satisfactory (I³)
Package insert (11 subcategories)	Satisfactory (I²)	Satisfactory	Satisfactory (N²)	Satisfactory (I²)	Satisfactory (I¹, N¹)
Time factors (7 subcategories)	Satisfactory (I³)	Satisfactory	Satisfactory^†	Satisfactory	Satisfactory^† (U¹)
Analytical quality control (3 subcategories)	Satisfactory	Satisfactory	Satisfactory^†	Satisfactory	Satisfactory^†

Comments from evaluators, i.e., intended users: a—operation facilities; intermediate ratings concerned error messages during start-up, difficulties regarding application of analytical quality control due to air bubbles in the pipette, difficulties with squeezing sample out of sample tube, noise from the instrument, and error messages in general, which led to loss of test strips. Package insert; intermediate ratings concerned limited information about error codes. Time factors; intermediate ratings concerned stability, at the time of evaluation, the cassette had 6 months stability, as of today this is no problem as the stability have been extended to 2 years; b—operation facilities; all intermediate and unsatisfactory ratings concerned either that it was one minute incubation of sample in extraction buffer before analysis or dissatisfaction with the nozzle/cap of the sample tube; c—operation facilities; intermediate ratings concerned the design of nasal swab and a two-minute incubation of the sample in the extraction buffer before analysis. As of today, there are no two-minute step in the extraction buffer; d—operation facilities; intermediate and unsatisfactory ratings concerned sample collection method, where the intended users preferred oropharyngeal sampling, generation of foam when dispensing drops to the test cassette and difficulties in reading the result; e—operation facilities; intermediate ratings concerned how the test kit was packed. Package insert; intermediate rating concerned difficulties in understanding professional language. Time factors; unsatisfactory rating concerned stability of opened quality control material. Rating scale: satisfactory (S), intermediate (I), unsatisfactory (U), and no opinion (N). Iⁿ, Uⁿ, Nⁿ: number of subcategories with intermediate rating. ^†, internal quality control not included in test kit. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Discussion

Over the course of the COVID-19 pandemic, SKUP evaluated the diagnostic performance and user-friendliness of five Ag-RDTs for the detection of SARS-CoV-2 in the hands of the intended users. The diagnostic sensitivities varied from 53% (95% CI: 42–64%) to 90% (95% CI: 82–95%). The diagnostic specificities ranged from 97.8% (95% CI: 95.6–98.9%) to 99.7% (95% CI: 98.7–99.9%). All the Ag-RDTs in the study obtained an overall rating of satisfactory user-friendliness.

All included Ag-RDTs showed a lower diagnostic sensitivity in the SKUP evaluations compared to the sensitivity reported by the manufacturer in the instruction for use (IFU, data from manufacturer not shown). The diagnostic performance of the Ag-RDTs found in the SKUP evaluations were similar to diagnostic performance reported in the meta-analysis in a review article published in Cochrane Library (4), though other studies have found an even higher variation in the diagnostic sensitivity, such as a study from Denmark by Schneider et al. (15). These results show the importance of manufacturer-independent evaluations or studies before implementation of new tests. The meta-analysis from Cochrane (4) and the Danish study (15) have assessed two of the SKUP evaluated Ag-RDTs: Lumira and CLINITEST (4,15). Lumira met, or most likely met, the WHO criteria for sensitivity in all three studies, while CLINITEST did not meet the sensitivity criteria in any. However, Flowflex most likely did not meet the sensitivity criteria in the SKUP evaluation, while having the highest assessed sensitivity (94%) in the Danish study. The observed difference in the sensitivity of Flowflex could be attributed to the different methodologies used. While SKUP’s evaluation was prospective with consecutive sampling, the Danish study combined inviting pre-tested subjects with positive RT-PCR samples and a retrospective part with pooled samples. The diagnostic specificities of the Ag-RDTs found by SKUP were similar to the Cochrane reported results and the Danish study (4,15).

A low Ct-value in an RT-PCR result is indicative of high viral load in the sample; thus, Ag-RDTs usually perform better on samples with Ct-values in the lower range (4,16,17). However, Ct-values are method-dependent and can therefore vary between laboratories and are thus not directly comparable (18). The overall point estimates for diagnostic sensitivities for the Ag-RDTs in the SKUP evaluation were higher among symptomatic than asymptomatic participants, which could indicate a correlation between viral load, symptoms, and diagnostic sensitivity (19).

Throughout the course of the evaluations, we observed a change in the relationship between time of symptom onset and probability of a true positive test result. During the first three evaluations, the observed sensitivities were similar for 0–1 day after symptom onset and 2–5 days after onset. During the final two evaluations, the point estimates of the diagnostic sensitivities of the Ag-RDTs tended to be higher 2–5 days compared to 0–1 day after symptom onset, though not statistically significant. The differences between the variants of SARS-CoV-2 and the evolving immunity in the population may explain this observation (20). Our findings indicate that postponing testing until a few days after onset of symptoms, may decrease the chance of false negative results.

The five SKUP evaluations included one instrument, Lumira, and four LFAs, all for rapid antigen detection. For diagnostic specificity, no significant difference was observed between the instrument and the LFAs. Lumira, however, showed higher diagnostic sensitivity compared to the LFAs. The LFAs are based on subjective interpretation when reading the results, while the Lumira instrument uses an automatic reading of the results, which could partially explain the difference in performance.

In low-prevalence screening situations, a high PPV is important as well as a high diagnostic specificity to reduce incidences of false-positive results. In high-prevalence settings, however, a high NPV as well as a high diagnostic sensitivity may be considered more important to limit the amount of false negative results. All five Ag-RTDs evaluated had a calculated low PPV in a low-prevalence setting and are therefore not considered suitable for screening asymptomatic populations. However, in a high-prevalence setting, and especially in a symptomatic population, the Ag-RTDs may be considered sufficiently accurate for clinical use. Figures 1,2 demonstrate the possible noteworthy difference in performance based on the prevalence.

Among the five evaluated Ag-RDTs, the instrument (Lumira) was the test with the widest range of causes for intermediate ratings among the evaluators. An instrument is naturally more sensitive to technical errors compared to LFAs. In some cases, the error messages addressed insufficient sample volume, which caused waste of test strips. The errors were not regarded as critical enough to decrease the overall rating by SKUP. The manufacturer reported that some of the recurring technical errors were eliminated in instruments produced from October 2020 onwards (not tested by SKUP).

Some institutions recommend repeated testing, especially in high-prevalence settings (21). The FDA recommendation concerning repeated testing is to repeat the test once, at least 48 hours after the first test. Independent of the diagnostic performance, repeated testing will increase the probability of detecting an infection (21). However, repeated testing may also increase the probability of false-positive results. If individuals with present symptoms indicative of infection get a negative test result, repeated testing is often recommended (1).

Strengths and weaknesses

The major strength of this study is the inclusion of five comprehensive Ag-RDT evaluations, performed in accordance with the same SKUP protocol. Evaluations under real-life conditions, with consecutive patients, give more relevant results than laboratory evaluations under optimal conditions. The evaluation of user-friendliness reported by intended users is also important information that is often not included in other studies/evaluations. Major weaknesses of the study include that the evaluations were performed at different times, in different populations, with different prevalences, different viral variants and with different reference standards (Tables 2,3). These factors make comparison and ranking the tests difficult. Additionally, many participants did not have their symptoms reported, as requested by SKUP; thus, this information could not be fully explored.

Conclusions

The evaluated Ag-RDTs are not suitable for screening in low-prevalence settings, but may be clinically useful in high-prevalence settings, especially among symptomatic populations. Manufacturer-independent evaluation of POC laboratory tests among intended users is advisable to ensure adequate documentation of performance and user-friendliness.

Acknowledgments

We acknowledge all evaluation sites and employees including the laboratories. Our sincere thank you to all participants that participated in the SKUP evaluations.

Footnote

Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/rc

Data Sharing Statement: Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/dss

Peer Review File: Available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jlpm.amegroups.com/article/view/10.21037/jlpm-24-56/coif). D.K.K. was received the evaluations were partially funded by the manufacturers and suppliers (LumiraDx UK Ltd., Healgen Scientific LC, Nal von Minden GmbH, Acon Biotech Co. Ltd., and Shenzhen Microprofit Biotech Co., Ltd.), who covered the testing costs and received impartial evaluations. DEKS, Noklus, and Equalis funded the evaluation-related work, such as planning, protocol writing, assistance during practical work at evaluation sites, data processing, and report writing. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. No ethical approval was needed as the study was considered a quality assurance project. Verbal informed consent was given from all included participants.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Peeling RW, Heymann DL, Teo YY, et al. Diagnostics for COVID-19: moving from pandemic response to control. Lancet 2022;399:757-68. [Crossref] [PubMed]
WHO. Antigen-detection in the diagnosis of SARS-CoV-2 infection. 2021. Available online: https://www.who.int/publications/i/item/antigen-detection-in-the-diagnosis-of-sars-cov-2infection-using-rapid-immunoassays
Cerutti F, Burdino E, Milia MG, et al. Urgent need of rapid tests for SARS CoV-2 antigen detection: Evaluation of the SD-Biosensor antigen test for SARS-CoV-2. J Clin Virol 2020;132:104654. [Crossref] [PubMed]
Dinnes J, Sharma P, Berhane S, et al. Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection. Cochrane Database Syst Rev 2022;7:CD013705. [Crossref] [PubMed]
European Centre for Disease Prevention and Control. Options for the Use of Rapid Antigen Tests for COVID-19 in the EU/EEA and the UK. Technical Report. 2020. Available online: https://www.ecdc.europa.eu/sites/default/files/documents/Options-use-of-rapid-antigen-tests-for-COVID-19_0.pdf
DEKS. Available online: https://deks.dk/en/front-page
EQUALIS. Available online: https://www.equalis.se/sv/
Stavelin A, Sandberg S. Essential aspects of external quality assurance for point-of-care testing. Biochem Med (Zagreb) 2017;27:81-5. [Crossref] [PubMed]
SKUP. Available online: www.skup.org
Commitee EHS. EU Common list of COVID-19 antigen tests. 2021. Available online: https://www.wellion.eu/fileadmin/user_upload/PDF/Ratgeber_und_Downloads/covid-19_rat_common-list_en_21Dec2021.pdf
Adjusted Wald. Available online: https://measuringu.com/calculators/wald/
Statistics SS. Social Science Statistics. Available online: https://www.socscistatistics.com/.
Lov om medisinsk og helsefaglig forskning (helseforskningsloven). Available online: https://lovdata.no/dokument/NL/lov/2008-06-20-44
Sundhedsloven. Available online: https://danskelove.dk/sundhedsloven/17
Schneider UV, Forsberg MW, Leineweber TD, et al. A nationwide analytical and clinical evaluation of 44 rapid antigen tests for SARS-CoV-2 compared to RT-qPCR. J Clin Virol 2022;153:105214. [Crossref] [PubMed]
Brümmer LE, Katzenschlager S, Gaeddert M, et al. Accuracy of novel antigen rapid diagnostics for SARS-CoV-2: A living systematic review and meta-analysis. PLoS Med 2021;18:e1003735. [Crossref] [PubMed]
Wölfl-Duchek M, Bergmann F, Jorda A, et al. Sensitivity and Specificity of SARS-CoV-2 Rapid Antigen Detection Tests Using Oral, Anterior Nasal, and Nasopharyngeal Swabs: a Diagnostic Accuracy Study. Microbiol Spectr 2022;10:e0202921. [Crossref] [PubMed]
Buchta C, Görzer I, Chiba P, et al. Variability of cycle threshold values in an external quality assessment scheme for detection of the SARS-CoV-2 virus genome by RT-PCR. Clin Chem Lab Med 2021;59:987-94. [Crossref] [PubMed]
Tsukagoshi H, Shinoda D, Saito M, et al. Relationships between Viral Load and the Clinical Course of COVID-19. Viruses 2021;13:304. [Crossref] [PubMed]
Meiners L, Horn J, Jones TC, et al. SARS-CoV-2 rapid antigen test sensitivity and viral load in newly symptomatic hospital employees in Berlin, Germany, December, 2020 to February, 2022: an observational study. Lancet Microbe 2024;5:e538-e546. [Crossref] [PubMed]
FDA. At-Home OTC COVID-19 Diagnostic Tests. FDA; 2025. Available online: https://www.fda.gov/medical-devices/coronavirus-covid-19-and-medical-devices/home-otc-covid-19-diagnostic-tests

doi: 10.21037/jlpm-24-56
Cite this article as: Hekland J, Morken C, Eriksson Boija E, Kristian Kur D, Sandberg S, Tollånes MC. Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2. J Lab Precis Med 2025;10:16.

Diagnostic performance and user-friendliness of five rapid antigen tests for severe acute respiratory syndrome coronavirus 2

Highlight box

Introduction

Methods

Enrolment of participants and study procedure

Table 1

Reference standard

SKUP evaluations included in the study

User-friendliness

Statistical analyses

Ethical considerations

Results

Population characteristics and sampling

Table 2

Table 3

Performance of the Ag-RDTs

Predictive values

IQC

User-friendliness of the Ag-RDTs

Table 4

Discussion

Strengths and weaknesses

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share