Introduction
In some cases, when answers to very concrete questions are needed, or when addressing relatively novel issues, the scientific evidence may be lacking, or the existing evidence may be of suboptimal quality. In such cases it is necessary to resort to the opinion and experience of experts, adopting a systematic approach based on consensus methodology, in order to explore the level of agreement or disagreement on a given subject. Thus, the need for consensus arises from the lack of consensus(1).
Among the different consensus methods, informal consensus is based on open and non-systematised discussion that often takes place in the context of a single physical meeting. It affords recommendations and very little information on the basis sustaining the consensus. Informal consensus is obviously very susceptible to bias and group effects(2,3), and since the anonymity of the participants is not guaranteed, some opinions are often not expressed or are diluted among those of other more reputed experts(4). In contrast, formal consensus combines scientific evidence with methodological techniques and structured processes in the making of decisions. In the healthcare setting, use fundamentally has been made of three formal consensus methods(5): the Delphi technique was introduced in the 1960s(6), followed by the nominal group technique in the 1970s(7) and, in 1977, by the consensus conference method developed by the United States National Institute of Health Consensus Development Program (8).
A fourth strategy, the RAND/UCLA method, was developed in the 1980s by the RAND Corporation and the University of California at Los Angeles (UCLA), and constitutes a hybrid of the Delphi and nominal group methods(9).
Table 1 highlights some of the differences between these approaches(5).
[[{"fid":"6011","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"1":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"1"}}]]
Although there are variations in the consensus methods, they all follow highly formalised protocols and share a number of fundamental principles that distinguish them from informal consensus: anonymity, iteration, controlled feedback, group statistical response and structured interaction(10,11).
Although these methods have recently been used in traumatology(12,13,14,15), it would be advisable to specifically examine their use in the context of hip arthroscopy. Based on the use of these consensus techniques, it has been attempted to synthesise the collective opinions in a current state of uncertainty (differences in opinion on different aspects of the approach to hip arthroscopy). For this reason, we raised the hypothesis that there are quality publications related to consensus methods in hip arthroscopy which could guide clinical decisions in the event of the lack of an adequate body of evidence.
The general objective of the present study was to identify the consensus methods used in the hip arthroscopy setting.
Material and methods
A systematic review was made of the literature using formal methods to ensure a pertinent and precise search and retrieval process. The present study was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement(16,17). Thus, before starting the literature search and subsequent data extraction, a review protocol was developed, describing each step of the systematic review, including the exclusion criteria. This protocol was reviewed and approved by three of the authors.
Information sources and search strategy
Systematic searches were made in September 2021 (without time restrictions) in the following electronic databases: MEDLINE and PreMEDLINE (Ovid), Embase, Scopus and Web Of Science (WOS), combining terms such as consensus, Delphi, RAND, nominal, hip or arthroscopy. A manual review was made of the references of all the selected articles in order to locate other studies not appearing in the first search. We consulted the Centre for Reviews and Dissemination (CRD) and International Network of Agencies for Health Technology Assessment (INAHTA) databases. In addition, we consulted the website of the Spanish Network of Agencies for the Evaluation of Healthcare Technologies and Services of the National Healthcare System (Red Española de Agencias de Evaluación de Tecnologías Sanitarias y Prestaciones del Sistema Nacional de Salud [RedETS]). An example of the MEDLINE search is provided in Figure 1. The rest of the literature searches are available upon request addressed to the corresponding author.
[[{"fid":"6012","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"2":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"2"}}]]
Selection of studies
On an independent and paired basis, two reviewers selected the studies by reading the titles and abstracts located through the scientific literature search. The full-text versions of the selected articles were reviewed and classified as included or excluded by the two reviewers, based on the established screening criteria. When doubts or discrepancies were found, they were resolved by consensus. The following study selection criteria were applied:
- Types of studies: consensuses using the Delphi technique, the RAND method, nominal group, consensus conferences or informal discussions.
- Participants: patients with hip disease of any kind.
- Intervention: any disease condition implying hip arthroscopy.
- Comparator: any.
- Outcome measures: data were extracted related to identification of the study, with the design and methodology.
Communications at congresses, letters to the editor, editorials and comments were excluded. We also excluded studies not written in Spanish, English, French, Portuguese or Italian.
Data extraction
Data extraction from the included studies was carried out by a reviewer and checked by a second reviewer. Doubts or discrepancies were resolved by consensus. The data were entered on customised electronic sheets.
The data extraction procedure was carried out in two phases. The first phase compiled information referred to identification of the study, such as the year of conduction / publication of the study, country and organising entity. The second phase of the data extraction procedure included the consensus method used, the purpose of the study, the number of participants and the setting (national or international). In addition, we determined whether a literature review was made, whether the participants received prior information, whether the survey method was described (survey sent by conventional mail, e-mail or conducted on an in-person basis), the number of rounds and the number of participants that responded in each round, whether voting was private and whether anonymity was preserved, whether there was a predetermined definition of consensus and if so, what was the definition. Lastly, in the event of consensus, we evaluated whether it constituted forced consensus. The reviewers evaluated not only the presence or absence of each of these parameters in the studies but also whether they had been explicitly declared and described in sufficient detail.
Quality assessment
The recommendations of Humphrey-Murto et al.(18) were followed to guarantee methodological rigour in using the consensus group methods. These recommendations afford a checklist with the purpose of providing all the information that is essential for writing, interpreting and correctly using the results of a consensus. In general terms, the greater the compiled evidence in support of the development or choice of a method, the more reliable the findings derived from its use will be.
Data synthesis
A narrative synthesis was made, with tabulation of the information collected from the included studies.
Results
Figure 2 shows the study selection process. The search in the aforementioned electronic databases resulted in the identification of 313 literature references; this figure was reduced to 212 after removing duplicates. Following the selection process, we finally included 13 studies corresponding to 10 original papers(19,20,21,22,23,24,25,26,27,28), two healthcare technology evaluation reports(29,30) and one appropriate use criteria document adopted by the steering committee of the American Academy of Orthopaedic Surgeons (AAOS)(31).
[[{"fid":"6013","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"3":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"3"}}]]
Table 2 shows the studies discarded after full-text evaluation(32-56) and the reasons for exclusion.
[[{"fid":"6014","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"4":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"4"}}]]
Of the 13 finally included studies, four used non-structured methods(19,20,24,30), while 9 used structured methods such as consensus conference(21), Delphi(22,23,25,28), Delphi together with nominal group(26) and RAND/UCLA methodology(27,29,31). The number of consulted specialists ranged between 9-869, and most of them were specialised in traumatology. The principal characteristics of the studies are described in Table 3.
[[{"fid":"6015","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"5":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"5"}}]]
Most of the studies addressed degenerative diseases such as osteoarthritis(19,27,29,31) or femoroacetabular impingement(21,24,25,26,28,29). Other addressed aspects were referred to hip dysplasia(24), infections(22,23), thromboembolism(20) and clinical trial screening criteria(30).
The study published by Altman et al.(19) was designed to describe the best available measurement method for detecting the progression of hip osteoarthritis, particularly in therapeutic trials. Consensus determined that radiography is an adequate primary assessment method for changes in hip osteoarthritis. The progression of osteoarthritis can be calculated by measuring the width of the joint space.
The report by Molina et al.(29) developed criteria for the appropriate or adequate use of hip arthroscopy in osteoarthritis and femoroacetabular impingement. Hip arthroscopy was considered generally inadequate as surgical treatment for osteoarthritis and adequate for femoroacetabular impingement, based on the presence of the following criteria: joint clinical manifestations, duration of the symptoms, functional alteration and patient age.
The AAOS consensus(31), also contemplated in the study of Riddle et al.(27), included pharmacological and non-pharmacological aspects, and surgical procedures, for symptomatic hip osteoarthritis. The suitability of hip preservation surgery was based exclusively on patient age and the radiographic evaluation of hip osteoarthritis.
The study carried out by Griffin et al.(21) aimed to reach multidisciplinary agreement with international experts on the diagnosis and treatment of femoroacetabular impingement. Consensus determined that in order to establish the diagnosis, the patients must present consistent symptoms, positive clinical signs and imaging findings. Adequate treatments in turn comprised conservative management, rehabilitation and arthroscopic or open surgery.
The consensus published by Marín-Peña et al.(24) addressed the indications of hip arthroscopy in degenerative disease of the hip and in hip dysplasia, even contributing surgical "tricks".
The study carried out by Radha et al.(25) addressed hip preservation in femoroacetabular impingement, in particular, intraoperative management of the capsule, the labrum, cartilage defects, the round ligament and bone impingement.
The purpose of the study by Lynch et al.(26) was to develop pre-, intra- and postoperative recommendations for femoroacetabular impingement through evidence-based consensus via a meta-analysis, a systematic review and a group of arthroscopists. Consensus was reached to the effect that hip arthroscopy should be the standard of care for the surgical treatment of classical or arthroscopically accessible femoroacetabular impingement.
The main objective of the study carried out by Winiger et al.(28) was to identify the key variables for performing arthroscopic treatment in femoroacetabular impingement syndrome. Consensus emphasised that treatment of the labrum and correction of the cam-type deformity are the two key elements in hip arthroscopy for the management of femoroacetabular impingement syndrome.
The international consensus of orthopaedic infections addressed the prevention and reduction of risks in the study of Aalirezaie et al.(23) and the treatment and surgical techniques in the study of Abouljoud et al.(22). However, the information on the methodology of these studies appears in the editorial of the monograph dedicated to the international consensus on orthopaedic infections(57). The consensus indicated that there is no evidence that prior arthroscopy increases the risk of subsequent periprosthetic joint infections.
Randelli et al.(20) aimed to establish agreement upon recommendations for the management of thromboembolism in orthopaedic and trauma surgery. They reported that, in all patients requiring pharmacological antithrombotic preventive measures, it is advisable to evaluate both thrombotic risk and bleeding risk — identifying high risk patients and those who will need careful evaluation.
The healthcare technologies evaluation report published by Griffin et al.(30) developed screening criteria for randomised clinical trials. In a survey of clinicians, the latter expressed their reserves about being able to conduct a clinical trial in patients with femoroacetabular impingement, though they were in favour of being able to randomise.
Quality assessment
With regard to quality assessment, of the 13 included studies, those that made use of formal methods obtained better scores, in general terms. In this respect, three of them could be considered of high quality(25,26,29), four of moderate quality(21,27,28,31) and two of low quality(22,23). In contrast, the four studies that used informal or non-structured methods were assessed as being of low quality(19,20,24,30). In all but the studies of Abouljoud et al.(22) and Aalirezaie et al.(23), the purpose or objective of the investigation was clearly defined. With the exception of the studies by Radha et al.(25), Lynch et al.(26), Winiger et al.(28) and Molina et al.(29), none of the publications explained how anonymity was maintained. The details referred to the evaluation of the quality of the studies are found in Table 4.
[[{"fid":"6016","view_mode":"default","fields":{"format":"default","alignment":""},"type":"media","field_deltas":{"6":{"format":"default","alignment":""}},"link_text":null,"attributes":{"class":"media-element file-default","data-delta":"6"}}]]
Discussion
The credibility and usefulness of the results of a consensus are directly proportional to the rigour applied in preparing and conducting the consensus. Consensus methods inevitably must be performed with great methodological rigour and complying with a series of quality requirements. In this respect, we have seen that most of the studies that used a formal or structured consensus method presented greater methodological quality, while in contrast those methods based on informal consensus strategies lacked this high expected quality.
In simpler terms and taking into account that each formal consensus technique can exhibit numerous variants, the main characteristics of the four methods presented can be described as follows. In the Delphi method the participants are surveyed in different rounds. They receive a questionnaire, and individual and/or group feedback is provided on the scores between rounds, specifying their positions and the global positions of the group. Consensus is obtained by means of a mathematical procedure involving the simple summing of individual judgements and the elimination of extreme (outlier) positions. The participants never meet or interact directly(5,10). The number of modifications implemented in the Delphi method has led to considerable confusion about its application(58,59).
In the nominal group technique, the participants physically come together in a meeting directed by an experienced moderator(60). In this meeting, and in an extremely formalised manner, they present their ideas, individually define their points of view, explain their differences, and individually vote each proposed solution(5,10,61). As in the previous case, consensus is obtained by means of a mathematical procedure involving the simple summing of individual judgements.
Consensus conferences involve the evaluation of the available evidence referred to some diagnostic or therapeutic intervention before a jury composed of experts and non-experts that are required to issue a report with recommendations on the use of the intervention. The process simulates an oral hearing in court. During the session, the experts defend the conclusions drawn from the evidence and interact with the public invited to the conference. In 2013, the Office of Disease Prevention withdrew the consensus conferences programme(8), though it is still conducted by other investigators.
Lastly, the RAND/UCLA method begins in a first phase or round with the submission of a questionnaire, while in a second round an in-person meeting is held to clarify or discuss the appropriate or inappropriate use of a medical or surgical procedure(9,61).
The findings of this review show that most of the 13 studies synthesised in the tables of our results were carried out in the United States (6 publications)(22,23,26,27,28,31), with Spain(19,24,29) and the United Kingdom(21,25,30) being the European countries with the greatest scientific production (3 publications), followed by Italy(20) (1 publication). This circumstance could be attributed to the existence of a greater tradition in the use of consensus methods over the years, and since the 1940s, on the part of the United States Army and Air Force(62).
It should be noted that the setting in which the identified consensuses were developed was predominantly at national level in the United States – comprising all the identified publications(26,27,28,31), except two of the same study with an international character(22,23) – while in Europe we located three publications in the United Kingdom(21,25,30) and two in Spain(19,24) involving an international setting. Only one publication in Spain(28) and another in Italy were characterised by a national setting(20). This could be due to the interest in obtaining consensuses applicable to a broader setting and not circumscribed to a single country (in the case of Europe).
Likewise, the identified publications included experts whose number ranged widely from 6 to 869 professionals. Most of them were traumatologists with experience in hip surgery, i.e. from a single discipline. In only four publications(20,21,27,31) did the professionals participating in the consensuses have different types of training (multidisciplinary).
Furthermore, most of the publications(19,21,22,23,24,26,27,29,31) identified the holding of at least one in-person meeting, thereby reflecting the importance of discussion among experts in establishing useful conclusions and consensuses.
The results of this systematic review also show that all the studies that employed informal or non-structured consensus methods(19,20,24,30) in relation to the use of hip arthroscopy, and two of those that used the Delphi technique(22,23), were carried out without clearly defining how consensus was agreed. Therefore, when the authors conclude that the results of the study reflect consensus-based opinion, it seems that the achievement of consensus was assumed as an integral part of the method used. Although consensus may be the expected result of applying a consensus method, we believe that it is necessary to better define the criteria for reaching such consensus and to document the degree of agreement along with the results obtained.
Even though most of the studies included in our systematic review had consensus as an objective, only some of them defined consensus with a specific criterion(21,25,27,28,29,31). Furthermore, this criterion was the reason for termination of the process, normally on the basis of a definition established a priori(21,25,26,27,28,29,31). However, we believe that an adequate approach would be to establish an a priori formal definition of the criteria used for consensus, instead of assuming the latter as an automatic outcome due to the intrinsic fact of making use of a consensus method. Furthermore, the investigators should also specify alternative criteria for termination of the process, including possibly a maximum number of contemplated rounds. If the studies are to be made in the course of a certain number of rounds, the authors should specify how the degree of agreement is going to be quantified at the end of the study.
To the best of our knowledge, there are no validated quality indicators for studies involving consensus methods. We therefore resorted to the recommendations of Humphrey-Murto et al.(18). These indicators were selected on the basis of those which we believe would allow the study to have both internal and external validity. According to these indicators, the quality of the reviewed studies was generally moderate or high in the publications involving formal consensuses. However, it is important to recognise that this scoring is based more on what is reported in the study than on the quality of the study as such. Therefore, we propose that these or other similar criteria should constitute a set of suggested elements to be included in all publications involving consensus methodology. We consider that the applicability of these criteria in the publications would be useful for the diffusion of quality protocols for clinical practice.
These considerations acquire importance due to the fact that level V evidence (expert opinion) remains a necessary component in the methodological repertoire used to determine the response to a clinical question, particularly in situations characterised by the absence of high quality evidence (and by the difficulty of obtaining such evidence), and by clinical variability.
Medicine based on evidence classifies randomised clinical trials and meta-analyses as the highest-ranking evidence, while less relevance is attributed to expert opinion, which is classified as corresponding to the lowest category. Nevertheless, randomised clinical trials and meta-analyses have weaknesses and strengths (since no research method is perfect), and they cannot always be applied or used as a design to obtain investigational results, due to type of patients involved (such as frail individuals or children), or the type of intervention under study (e.g. surgeries) — since doing so would not be acceptable from the ethical perspective.
Indeed, "no study design is perfect, and contradictory findings may arise from all types of studies"(63). Having said this, the practical alternatives to studies where strong confidence has been placed on their results (randomised clinical trials, cohorts, observational studies, etc. with good designs) range from the current observational studies (since we are now in the era of big data in large health registries) to the traditional methods — including expert opinions (more feasible and accessible in some cases).
Whichever design is considered more appropriate for achieving the objective of our research, it must be accompanied by quality and methodological rigour in order to be able to rely upon and extrapolate the results with the lowest risk of biases or limitations. All scientific research is fundamentally dependent upon the use of adequate and rigorously detailed investigational methods — and studies based on expert opinions are no exception to this(64). It may be pertinent to present studies that use consensus methods in accordance with certain indicators similar to those of the CONSORT statement, as used for example in randomised controlled trials.
This systematic review of the literature offers an overview of the different consensus methods used in hip arthroscopy. However, the limitations of the present study are those inherent to the application of its methodology, including publication bias derived from the fact that many scientific studies are not ultimately published, or selection bias, which depends on the objectiveness of the inclusion and exclusion criteria used in the studies. We have minimised the risk of such biases by including several literature sources and working with broad criteria for the inclusion of studies.
Conclusions
The consensus methods analysed in this review and which evaluated the use of arthroscopy in hip disease were predominantly formal consensus protocols. In most cases, the use of these structured methods provided the criteria needed to establish consensus among the professionals.