Appendix 3 - RCoA Clinical Assessment Strategy for assessment leading to the CCT/CESR[CP] in Anaesthetics

Published: 25/09/2020

Appendix 3 - RCoA Clinical Assessment Strategy for assessment leading to the CCT/CESR[CP] in Anaesthetics

In this appendix the background to the RCoA’s assessment strategy is described.

There are two significant obstacles to the assessment of clinical learning outcomes in the course of postgraduate training.

Their Validi–y - In many situations it is difficult to find outcomes that are measurable and that relate directly to the capabilities being considered.
Their Reliabili–y - Clinical assessment is difficult to standardise and depends upon a subjective expert judgement by the observer.

The RCoA has adopted an approach to assessment by observation of performance in line with the methodology known as the ‘Cambridge Approach’ that focuses on performance as a product of competence³⁹. It is important that knowledge be specifically assessed separately, as knowledge and skill in procedures appear to be independently acquired with skill learning preceding knowledge⁴⁰. This is the reason why separate, high validity, high reliability assessments of knowledge are undertaken in the FRCA primary and final examinations.

Evidence for the Annual Review of Competence Progression (ARCP)

Award of the CCT depends on having completed a recognised programme of training and having demonstrated key knowledge and capabilities in the course of assessments. Trainee progress through the curriculum is monitored by a scheme of assessments.

This evidence is reviewed at an Annual Review of Competence Progression (ARCP) and this determines the learner’s further progress⁴¹.

This document describes the evidence that learners should present at their ARCP. It is primarily the responsibility of the trainee themselves both to understand what evidence will demonstrate appropriate progress and to accumulate and tabulate this evidence. Inability to collect and organise the evidence is itself taken to be a significant failing which is likely to be reflected in other aspects of professional life.

The ARCP is organised and operated by Postgraduate Deans. Its general principles are laid down by the GMC and are described in the ‘Gold Guide’. The RCoA is responsible for advising on the specific evidence that is required in its specialty training programme.

2. Workplace observational Assessment is an Expert Process

Until recently postgraduate medical education relied almost entirely upon high-stakes knowledge testing in professional exams. Tests of practice were deficient and did not make use of the workplace (ie, specialists

were not formally tested ‘on the job’). The ‘competence’ movement in education was adopted into medical education in the 1990’s and led the introduction of workplace testing. Both critics and enthusiasts of this approach have been concerned about the reliability and validity of assessments taking place in the non-standardised environment of the workplace.

The RCoA has used a satisfactory/unsatisfactory metric for workplace-based assessment. Michael Polyani developed the idea that much of the success of experts depends upon what he called “tacit knowledge”⁴². Tacit knowledge cannot be fully described or explained which makes it difficult to teach and test. It is widely believed that expert observers can, however, use their own tacit understandings to discern whether or not the practice of their expertise that they are observing is ‘adequate’. Therefore, many workplace-based assessment systems have used a simple yes/no response to the question, ’Was this performance adequate?’.⁴³ Recent publications on assessment in anaesthesia⁴⁴ and psychiatry⁴⁵ support the premise that experts can identify satisfactory performance, but are less able to judge levels of performance within the pass or fail categories. There is still need for good formative feedback and there is space on the form for this.

3. Checking competences can provide spurious evidence of competence

It is tempting to try to make assessment by observation more reliable by ‘unbundling’ the competences into separately-assessed sub-competences. This however, encounters the problem that it is possible to be competent in each of the individual components of a clinical process whilst the performance of the whole remains inadequate⁴⁶. The Tooke report has specifically identified the competence approach to learning as one of the possible root causes of mediocrity⁴⁷. The RCoA has not specifically broken down clinical work into small competences but has chosen to identify higher level learning outcomes that are demonstrated in the course of all work. This enables the use of the same marksheet in all circumstances.

4. Exams

The exams are a high stakes assessment in two parts. They principally investigate the learner’s basic science and medical knowledge concentrating on its application in practice. The examination process is subject to stringent quality control and the validity and reliability of each separate assessment within the process is scrutinised.

5. Workplace-Based Assessments

5.1. The Process of ensuring face validity of assessments

The face validity of the anaesthetic workplace-based assessments depends upon the relevance of the curriculum. The content of the anaesthetic curriculum has been established in a protracted developmental programme. The first version was developed within one school of anaesthesia fifteen years ago. The process involved the use of an expert group that commented on the expected performance of the trainee in general and specialist areas, and the stage of training at which competence could be expected⁴⁸. Subsequently this curriculum was adapted and updated for use in all anaesthetic training. In that process it

has three times been subjected to review by specialist societies and by working groups within the Royal College of Anaesthetists. These working groups have included medical managers, service managers and the representatives of patients. The curriculum competency statements therefore form an assured basis for the content of assessments and their direct relationship to real clinical situations encountered by the trainee at that stage assures both their face and context validity. The content validity that relates to non-technical skills derives from a taxonomy of behavioural markers developed specifically for anaesthetics using a grounded-theory research methodology⁴⁹.

5.2. Choosing Appropriate Assessment Instruments

The curriculum was reviewed and the cognitive learning outcomes that lend themselves to conventional testing by written and oral examination were marked for formal examination.

Those cognitive, psychomotor and behavioural learning outcomes that remained have been allocated to appropriate instruments for workplace-based assessment. As an outcome-based curriculum identifies very large numbers of items, a strategy of sampling assessments has been selected in order to make the assessment task manageable and to minimise the disruption of normal work and the possibility of increased risk to patients.

An assessment instrument has been identified for every competency in the curriculum. Where possible more than one methodology is identified so that it is possible to triangulate performance. It is intended that a sample of these assessments will be undertaken by each learner. Test schedules that incorporate every competence statement tend to trivialise assessment and become very labour intensive.

All assessments are derived directly from the curriculum and are in line with the GMC standards and published guidance on assessment strategies, one of which is shown in Appendix 8. All items map to the GMCs document Good Medical Practice⁵⁰. Non technical learning outcomes are mapped to the schedule of Anaesthetic Non-Technical Skills (ANTS) (Appendix 7). The CanMed⁵¹ classification of the roles of doctors has informed the learning outcomes, in particular those that relate to professionalism (Appendix 4). In addition the assessment system conforms to the GMC Standards for assessment

The choice of which outcomes to assess is left to the learner and their educational and clinical supervisors. This will depend on the opportunities that the clinical work presents. The marking schemes for all the assessment instruments focus on the underlying capabilities and attitudes in such a way that general conclusions about future performance can be inferred.

5.3. The Available Assessment Methodologies

A pragmatic approach to the choice of assessment methods has been adopted. As anaesthesia and critical care have Foundation Doctors, and many Consultants are familiar with their assessment methods – and are trained in their use, it has been decided to continue with these same systems throughout the CT and ST training. These are the A-CEX, DOPS and CBD. In addition these methodologies have a practical utility attested to by experience in their use and at least some objective evidence that correctly applied they have validity and reliability⁵². An additional tool has been developed by the specialty of Acute Medicine which has been adopted by other specialties and is mandatory in the programs for ACCS training. The Acute Care Assessment Tool (ACAT) is used to assess a longer period of work in which a number of patients are seen, evaluated and treated. This is typically used to observe, score and report performance during a period of ‘take’; when the doctor receives a number of patients during a day or night of acute reception duties. The ACAT is believed to allow observation of the ability to organise, prioritise and integrate complex clinical activities. Such extended work also calls upon advanced ability to organise and work in teams. Validation of his new assessment instrument is at present limited to a pilot study of the responses of trainers and trainees to its use⁵³. There is no data to establish its validity or reliability. Nonetheless, the principle of an extended assessment is attractive and the RCoA has developed a similar approach.

Descriptors of competencies demonstrated during ACAT:

Clinical assessment	Quality of History and Examination to arrive at appropriate differential diagnoses
Medical record keeping	Quality of recording of patient encounters on the take, and including drug and fluid prescriptions
Investigations and referrals	Quality of a trainee’s choice of investigations, and referrals over a take period
Management of critically ill patient	Quality of treatment given to critically ill patients encountered on the take assessment, investigations, urgent treatment administered, involvement of appropriate colleagues (including senior)
Time management	Prioritisation of cases and issues within the take, ensuring sickest patients seen first and the patient’s most pressing issues are dealt with initially. Recognition of the quality of a colleague’s initial clerking to inform how much further detail is needed. A full repeat clerking is not always needed by a more senior doctor.
Management of Take / Team working	Clinical leadership Appropriate delegation and supervision of junior staff.
Appropriate relationship with and involvement of other health professionals	Handover Quality of the handover of care of patients from the take to the relieving team. If patients have been transferred to a different area of care then this applies to the quality of the handover to the new team.
Overall Performance	What level was demonstrated by the trainee’s performance in this take period?

The categories of observation for the ACAT are shown above. Whilst the broad categories of work observed and its properties are the same for anaesthesia the specific descriptors of performance to be observed do not. The RCoA has therefore adapted the ACAT and produced a similar assessment for anaesthesia called the Anaesthetic List/Clinic/Ward Management Assessment Tool [ALMAT]. In line with the general approach throughout the assessment system, the marking is either satisfactory or unsatisfactory. The same descriptors are used to mark unsatisfactory performance as are used for the Anaes-CEX. This is because the descriptions of poor performance are sufficiently generic to apply to all observations of work.

5.4. How many workplace-based tests

The purpose of the anaesthetic workplace-based tests is not to tick off each individual competence but to provide a series of snapshots of work from the general features of which it can be inferred whether the trainee is making the necessary progress – not only in the specific work observed – but in related areas of the application of knowledge and skill. The number of observations of work required will not be fixed but will depend on the individual trainee’s performance.

The literature is inconclusive but suggests that inter-rater reliability between repeat episodes of performance requires 12-15 cases to become reasonably consistent.⁵⁴,⁵⁵ This number probably constitutes the minimum number of observations per year. The RCoA therefore sets a minimum of 1 assessment type identified for each unit of training in the respective training level blueprint or the School defined number, whichever is the greater. Where a trainee performs unsatisfactorily more assessments will be needed. It is the responsibility of the trainee to attend for annual review with what they consider to be evidence of satisfactory performance and satisfactory progress. It is the educational supervisors responsibility to help the trainee to understand what that evidence will be – in their specific circumstances.

Once again it must be stressed that there is no single, valid, reliable test of competence and the ARCP will review all the evidence, triangulating performance measured by different instruments, before drawing conclusions about a trainee’s progress.

5.5. The Annual Review of Competence Progression (ARCP)

Performance in the course of clinical work is notoriously difficult to assess. In anaesthesia this is complicated further by the very low rate of observable errors and adverse outcomes that are caused by slips and errors on the part of the anaesthetist. It is therefore important to understand that all concerned accept the weak reliability of the observational assessment.⁵⁶ The assessments are intended to provide information to an annual review at which, by examining information from a wide variety of data a judgement about the learner’s adequacy of performance and progress can be made. It is further accepted that such inadequacies may result from deficiencies in the clinical experience and problems with the instructional programme as well as from under-performance by the learner. The Annual Review will initially lead to targeted or remedial training. This will be organised from within the school of anaesthesia in partnership with the Postgraduate Dean. From an extensive review of the way anaesthesia is learned it seems clear that successful assessment of progress if simultaneous use is made of a variety of measures.⁵⁷

A wide variety of information is available at the annual review. It is deemed to be the learner’s responsibility to present their reviewers with evidence of satisfactory progress. This will be in the form of the learners ‘Portfolio of Learning’. Sources of information are:

evidence of performance in professional examinations – if applicable;
a log of clinical work undertaken;
a reflective diary of learning experiences;
the results of in service assessments;
the consultants’ end of module feedback;
a record of agreed targets and outcomes from interviews with their educational supervisor;
a multi-source feedback if appropriate; and
optionally – a record of a School of Anaesthesia appraisal interview.

It is accepted that there is no good evidence of the validity and reliability of any of these evidences. The process of reviewing them is not arithmetic. The review must seek to use these evidences to answer 4 questions:

Criterion	Domain in Good Medical Practice	Evidence
Has the learner undertaken a clinical workload appropriate in content and volume to the acquisition of the learning outcomes	domain 1,2,3	Logbook, Consultants report, Appraisal
Has the learner met the general educational objectives of the curriculum and personal and specific objectives agreed with their educational supervisor or as a previous remedial programme	domain 1,2,3	Log-book, Educational supervision reports, Appraisal
Do the learners supervisors believe that they have performed satisfactorily in their clinical work – as judged by their reports and the workplace-based assessments	domain 1,2,3,4	Log-book, Workplace-based assessments, educational supervision, consultants reports
Is their evidence that the learner performs satisfactorily as a member of a clinical team including teamwork and a focus on safe practice	domain 2,3,4	Multi-source feedback, Consultants reports, Appraisal

Good Medical Practice

GMP Domain	Domain description
1	Knowledge, skills and performance
2	Safety and Quality
3	Communication, Partnership and Teamwork
4	Maintaining Trust

6. The Workplace-Based Assessments

6.1. The DOPS and A-CEX

Assessment by the direct observation of work is based on the belief that an expert is able to make a judgement about the quality of an expert process by watching its progress.⁵⁸ This is the methodology of the motor vehicle driving test and there is a long history of the use of observational assessment in the accreditation of practice.⁵⁹

Medicine has a long history of such assessments, by the informal means of supervision, but it is only recently that efforts have been made to formalise and standardise the observations. It has been noted that observation of work in the context of the formal ‘long case’ improves reliability and this is probably applicable in a real work situation.⁶⁰ Clinical decision making is particularly difficult to assess because the quality of the outcome is often only remotely influenced by the decision – many incorrect actions do not result in adverse consequences, and because they make extensive use of tacit knowledge which cannot be adequately articulated.⁶¹ The literature on reliability is sparse and largely based upon showing that one form of assessment is as reliable as another or that learners and assessors felt comfortable with the assessment process.^{62 63 64} There is little or no evidence to show that any available observational assessment correlates with capabilities such as the outcome of clinical decision-making in complex situations.

The observation of practical skills lends itself to observational assessment more easily. The assessor can observe that the critical stages of a process are carried out in correct sequence without omission and can make specific observation of factors such as knowledge of relevant anatomy and the correct performance of safety checks and precautions to maintain sterility. There is however little clinical judgement involved in most such procedures in anaesthesia and it is important that their better inter-observer reliability as compared to the A-CEX does not result in them being given an exaggerated weighting in decision making. If an event is relatively clinically insignificant repeated assessment however reliable and valid does not increase its clinical significance. It has also been observed that even straightforward practical procedures require the exercise of expert and tacit knowledge that neither trainee nor assessor may appreciate.⁶⁵

A final point of importance in considering the acceptable standard of performance in a workplace-based assessment is the effect of steep learning curves and the lack of uniformity of the trainee’s experiences. Anaesthesia has more than a dozen major subdivisions. The trainee is therefore repeatedly confronted with new situations. Often they are effectively back at square one after a change of placement. Performance has repeatedly been demonstrated to improve rapidly over the first 30 iterations. Some sub-specialty experiences are similar and having undertaken one will facilitate learning in another – so the trainee’s trajectory through modules will influence their performance.^{66 67 68 69} Assessment of the trainee in these circumstances relies heavily on the consultant’s expert understanding of the standard at which the learner should be working – taking into account their specific previous experience. It is not feasible to attempt to time assessments so that each occurs at a particular level of experience. It must also be noted that the interpretation of performance in a practical procedure must relate to the logbook data on numbers undertaken as there is evidence that trainees are unlikely to be able to undertake sufficient numbers of cases to become fully proficient.^{70 71}

6.2. The RCoA strategy in scoring observational assessments

The reliability of observational assessment is increased by a strategy of using multiple observers and assessing on multiple occasions. With workplace-based assessment during real work this affects the progress of work and frequent testing may not be feasible.

The primary question on the RCoA mark sheet is whether the observer considers the performance satisfactory or not. The limen for this decision is part of the observer’s judgement – as an expert in the field. This criterion has been adopted rather than marking against a scale because of the difficulty in defining other grades of performance. Obviously a decision about the overall adequacy of performance cannot be made by summing the grades in each element of the performance. A deficiency in one cannot be compensated by good performance in another.

If the assessor believes the performance to be satisfactory they are asked to offer feedback - both positive and negative.⁷²

If the observer rates the performance unsatisfactory they must complete a grid which tabulates the specific areas for concern. Once again these categories all map to domains in Good Medical Practice to ANTS and to CanMed. The critique is highly specific and provides Consultants with an educational vocabulary in which to describe their concerns. This feedback is designed to allow the structured development of any remedial programme and to give a consistent emphasis from assessors in the event of a trainee continuing to perform so inadequately that they are unsuitable for continued training.

The RCoA recognises that the quality of the feedback given to learners who perform satisfactorily is less structured. This is not believed to be very significant in the context of our training practices. Anaesthesia is hazardous and close supervision of trainees is mandatory. The RCoA requires that trainees engage in a high proportion of supervised practice – consultant / trainee working is more frequent and closer than in most specialties. They therefore perform many cases under direct supervision and the quality of anaesthetic education depends heavily on the educational approach during that work. This will include repeated feedback.^{73 74} Against this background it has been felt that the advantage of presenting an assessment form that is easy to complete when work is satisfactory is overwhelming in improving compliance, and engagement with the testing regime.

6.3. Case Based Discussion

In anaesthesia this will most frequently focus upon the choice and practice of anaesthetic technique in many surgical and patient contexts. The RCoA has defined topics for CBD that are appropriate to all the contexts of training. Assessments should not be made using other topics without checking that they are appropriate i.e. the issue is in the curriculum for the trainee’s present state of training.

CBD is also used for assessing the more generic, and less clinical, knowledge and skills needed for effective practice. e.g. evidence based practice, maintaining safety, teamwork, clinical research methodologies etc.

6.4. Simulation based assessment

The practice of anaesthesia is often likened to flying an aeroplane. Pilots are largely trained in simulators – so why not anaesthetists. There are many reasons. Firstly, anaesthetists train for many more hours than pilots and provision of sufficient simulators and instructors is totally impracticable. Learning normally takes place during real work and time in simulation is therefore a loss of service. Secondly, medicine does not obey predictable laws. Problem solving exercises in simulators whilst useful are not predictable enough to be valid and reliable as assessments. In a recent investigation between ten and fifteen simulator assessments were necessary to achieve reliable scores between raters.⁷⁵ Simulation has an important role in teaching, particularly in rehearsing uncommon events and team training. It has an important role in assessment as a medium for demonstrating procedures and routines but at present it is impractical to use it routinely to test decision making and critical thinking skills. The RCoA has not made extended use of simulator based assessment for these positive reasons and not through omission.

6.5. A logbook and portfolio which record the learners clinical and educational experience

Trainees are required to keep a record of the cases that they undertake. The level of detail of these records is described elsewhere. The RCoA has defined the categories of experience but has not stipulated the number of cases that must be undertaken. This is because it is more important to demonstrate competence than to achieve a target of experience. Self evidently a learner cannot become competent without undertaking cases and their performance must be considered in the context of their experience. In the event that assessments indicate underperformance in an area of practice the first response is to check from the logbook that the learner has had sufficient exposure to it. Incompetence in the face of what is usually sufficient exposure is a cause for concern.

There is a significant body of evidence regarding the acquisition of complex capabilities and of isolated skills. Surprisingly the learning curves are very similar. Where a clear outcome measure is available as a judgement of performance the learning appears to occur in three stages. The rate of improvement is initially very fast with performance scores rising to about 50% of that of an expert within 10 repeats. Over the next twenty cases the rate of improvement slows down with about 75% of expert performance being achieved after 30 repeats. From this point on improvement occurs slowly and it can take 200 cases to achieve 90% of expert performance. Some studies have shown that improvement beyond this point continues – but is very slow. It is important however to appreciate that individual learners can perform well below the levels predicted from pooled data.

What is the outcome target for learning? In many clinical situations there are no measurable outcomes or proxies for performance. We have little alternative but to assume that performance follows the same profile in these situations as for those we can measure. Using this pooled data we can assume that a working knowledge of a major competence requires 10 cases and reasonable performance requires 30. It is unlikely that any learners acquire capability faster than this and some will be much slower.

Decisions regarding necessary levels of experience are further complicated by the fact that many capabilities incorporate cross competences with other situations. Sometimes competences are acquired and honed in a number of types of practice and are then brought together in a new type of case – in which case capability is gained quickly. Some capabilities are very important and all trainees must become expert in them during training. Others are less important and the development of expert performance can wait until the learner is engaged in practice that calls for their more frequent use – which may not be during postgraduate training.

In summary, judging whether a trainee’s progress is satisfactory is a complex process that the trainer learns from their experience of training. Log book data will indicate whether it is possible that the learner has enough experience to be competent, it will not confirm that they are competent.

What logbook data will do is, firstly reveal any deficiencies of the training rotation in providing suitable experience, and secondly reveal situations where the learner is avoiding a particular type of work.

The portfolio of learning is more than a logbook. It will include reflections on learning and a record of other teaching and of discussions with the educational supervisor. Trainees do not always get the things out of their reflective portfolio that educationalists would hope for and expect.⁷⁶ The marking of portfolios as an assessment tool is exceptionally difficult and the RCoA does not require that this exercise be undertaken.

6.6. Evidence of participation and attendance at training events

Until recently evidence of attendance at a learning session was taken to be the standard for accumulation of credits in continuing medical education. Attendance does not assure that learning has occurred but it does signify compliance with an appropriate learning plan. There are a number of aspects of training that lie on the periphery of practice such as Research Methods, Management, Evidence Based Practice, Teaching and Assessment. At present there is little focussed assessment in these areas and significant practical difficulties lie in the way of introducing summative assessment. The RCoA has at present adopted the middle ground in these areas and requires that evidence of participation in learning is presented to the ARCP. These include attendance at specific courses, evidence of presentation at local audit/quality improvement and research meetings and records, feedback etc from teaching the trainee has delivered themselves.

6.7. An Independent Appraisal

Evidence to the ARCP must include an appraisal. In many Schools of Anaesthesia this will be with the educational supervisor and will be part of the documentation relating to episodes of supervision. Some Schools conduct independent appraisal of the ARCP evidence in advance of that meeting and include this formal appraisal in the evidence for the review. This practice provides a more independent review of their training which will also include the adequacy of their educational supervision, as poor planning by the supervisor may contribute to poor outcomes by the trainee. It also provides the trainee with the opportunity to explain and expand upon the evidence they present in their portfolio.

7. Oral Assessment in the RCoA Assessment System

Oral assessments are tasks designed to provide students with opportunities to develop and demonstrate their command of (1) an oral medium, and/or (2) of content as demonstrated through the oral medium. The RCoA makes extensive use of oral assessments in the assessment strategy for the CCT in anaesthesia.

Despite the reservations of some educationalists, in the UK, oral examination remains a common method of assessment in higher education⁷⁷ and in medical schools.⁷⁸ This can be by presentation (much used in HR

appointment procedures), by interrogation, or by discussion. It can be used in connection with work that the candidate has undertaken previously (e.g. MD thesis), to demonstrate knowledge at low taxonomic levels (recall of facts), to show the basis of decision making and manipulation of knowledge for complex problems and as a measure of oral communication skills. Its use in these contexts is well established in medicine but also in law, and most other disciplines in higher education.

There are concerns about reliability and the possibility of bias and prejudice affecting the outcome of oral assessment and this has led to criticism of its use. This is particularly the case when it is used for a high-stakes assessment such as the FRCA examinations.

The RCoA assessment system makes extensive use of oral assessment:

Face to face examination in both parts of the FRCA;
Some stations of the OSCE in the Primary FRCA;
Elements of the A-CEX and CBD; and
Simulation.

7.1. Advantages of Oral Assessment

Oral assessment:

Is ‘Authentic’? Case based discussion; OSCE and some viva voce discussions across the examination table are conducted in ways that resemble the clinical use of material. During work, colleagues require an anaesthetist to explain and justify a clinical decision, and an oral format for questioning allows a more realistic context for assessment.
Explores decision-making. Candidates can explain the reasons for things very clearly. This applies equally to scientific understandings and to the choice between clinical alternatives. Not only can they explain their reasoning but also they can argue in favour of their choices. Written tests require that the candidate has the same understanding of the question as the examiner from a limited scenario whereas in discussion the examiner can correct any misunderstandings so that the trainee gets a fair chance to explain and defend their proposed actions. This replicates the exchanges in clinical teams.
Is Engaging. Just as learners have preferred learning styles, so they have preferred assessment styles. Some candidates engage better with assessment by discussion than with written tests. Use of a variety of assessment methods allows all candidates to have some assessment in their preferred style.
Promotes learning. Proper preparation for oral examinations is a powerful instructional tool. It promotes clarity of thinking and clear communication.
Promotes Examination Security. Impersonation and plagiarism are hard to counter but face to face examining can be associated with good security. It would be very audacious, to appear for a high-stakes oral examination on behalf of another. If the candidate was impersonated at the written exams this could be revealed by a discrepancy between the oral, workplace and written marks.
Allows ‘Triangulation’. The use of a variety of assessment systems enables judgement to be made about capability by more than one method. This can confirm that a problem is real or allow the interpretation to be made that a candidate has a difficulty with the style of an assessment system – for which allowance can then be made.

Oral exams are most suitable for assessment of:

Communication skills;
Understanding – students can explain their knowledge and understanding;
Problem solving, critical-thinking, clinical-reasoning and the application of knowledge – a problem can be thought through and each stage described;
Prioritisation – learners can identify what is important and minimise less important knowledge. This is invaluable, as the trainee who knows all the answers but thinks first of rarities is well known to clinicians, and is less effective in the workplace than the learner who sees clearly;
Interpersonal skills. Scenarios with simulations or in real clinical situations give an opportunity for candidates to show their real interpersonal skills;
Professional demeanour – clinical cases, whether real or simulated allow the professional persona or ‘bedside manner’ to be observed; and
Personal characteristics – some oral formats enable the observer to judge manner, calmness under pressure etc. Here lies a minefield.

7.2. The case against oral assessment

A vigorous argument is sometimes mounted against oral assessment. Gibbs et al have discussed its deficiencies.⁷⁹

Oral assessment is time consuming and expensive;
Standardisation of encounters is difficult;
Reliability depends on the impartiality of examiners;
Validity depends on the skill of the examiners. It is difficult for examiners to strike a balance between setting the candidate at ease and maintaining a coherent line of questioning;
Predictive and consequential validity are not known;
Content of encounters varies with the candidates speed of response and language skills;
It is sometimes difficult to miss the point of the candidate’s responses because of being distracted by their manner or by their presentation;
It is difficult for the examiners to justify a mark; because the judgement may rest on very little information and because there is no written record to review and memory of what the candidate says may be imperfect. (and at odds with the candidate’s memory);
Such a wide and unpredictable range of questions may be asked that it makes it difficult for the candidate to prepare for the test; and
The examinees skill and experience with the format may influence their score

7.3. Reliability and Validity

If oral assessment is unreliable it is useless as a summative metric.

Its face validity is usually high because the questioning is used to relate knowledge to real contexts and sometimes the assessment takes place in real work or simulation. Its consequential and predictive validity are also questioned.

There have been few investigations of the validity and reliability of oral assessment as used in medical certification processes^{80 81} or of observational assessment that includes an oral component, in simulators.⁸² Norcini has questioned the reliability and predictive validity of ‘long case’ clinical examination⁸³ and developed and introduced the A-CEX partly in response to these concerns.⁸⁴ He has shown that the results of several A-CEX with one observer are as reliable as a full-blown Clinical Examination Exercise lasting a couple of hours and using several observers. Daelmans also found that it was feasible to increase the number of oral exams in a CBD format within existing resources, and that this improved their reliability. Interestingly he also found that the reliability of a single global rating was superior to that of separate scoring items. It has been demonstrated that inter-rater reliability can be improved by use of a more systematic observation and scoring strategy.^{85 86} On the other hand Olson looked at scoring and candidate satisfaction between free questioning and the use of a grid of questions. He found that the grid did not increase inter-rater reliability and that the candidates felt that they had been less able to demonstrate their command of the subject.⁸⁷ Oyebode et al found with psychiatry oral exams that, when using a ten point scale that there was poor correlation between pairs of observers but that when the data was examined for correlation over the pass/fail criteria there was good agreement.⁸⁸ This may be taken as evidence to support the premise that experts know what constitutes satisfactory and unsatisfactory practice but that judgement of gradations within those categories is more difficult. Kearney examined inter-rater reliability in the Canadian anaesthetic certification oral assessment and found average correlations of just under 0.8 for 80% of test items.⁸⁹ Schubert used a mock examination format similar to the oral assessment for anaesthesiologist’s specialist certification in the USA and saw inter-rater correlations of between 0.6 and 0.8. Again there was better agreement (84%) between observers when considering pass/fail criteria. These workers compared the mock assessment with other measures of performance during training and found moderate levels of correlation in the order of 0.6.⁹⁰

Simpson carried out a study examining the various elements of questioning and decision making that demonstrated that a wide range of topics were explored, and that examiners generally presented their questions in a sequence that took the candidate through the stages of decision making. They recommended that oral examinations that intend to explore decision-making should be formally structured in ways that lead the candidate through the various stages.⁹¹

Sambell used a qualitative technique based on interviews to discover what effect oral exams had on the learner (consequential validity). They concluded that oral assessment did indeed have the educational impact on learners for which teachers hope.⁹²

7.4. Candidates Opinions

All doctors have undergone many oral assessments and most remember them as being amongst the most stressful and frightening of their ‘life crises’. Despite this there is data suggesting that students have a positive view of oral assessment.^{93 94} Students report that the need to explain what they are thinking leads to better preparation as well as clear thinking. They also say that ‘practice vivas’ are actually a very valuable learning tool. Some students report that it is at the same time more demanding and more satisfying. Candidates value the opportunity to reason with the examiner and persuade them that their answer is acceptable. No candidate likes any exam, but the evidence suggests that they do not dislike orals more than other examinations.

7.5. Measures the RCoA takes to maximise the quality of its oral assessments

1. Planning. The examinations committee of the RCoA has planned its formal oral examinations with great care. The exams conform to GMC standards for assessment and also to the good practice identified by the American Association for Higher Education. Importantly, the RCoA examinations committee believes that oral assessment is an essential part of the college’s assessment strategy. It recognises the weaknesses of oral examination, organises systems to minimise these and believes that its advantages are sufficient to make its use mandatory.

2. Examiner Preparation. The examiners for the formal FRCA examinations nominate themselves and fill in a formal application that is reviewed by the Examinations Committee and Council. Their educational knowledge and involvement is considered as well as their reputation as a content expert. Examiners observe the oral exams and undergo training before they participate. Khera et al have produced an excellent review of the situation regarding examiner training and recruitment in paediatrics.⁹⁵ Most of what they say applies equally to anaesthetics. They identify a large curriculum for training examiners. The feasibility of such a programme for new examiners in anaesthetics is doubtful but the same objectives can be more easily achieved by introducing a continuous education programme in education – starting in the ST years and continuing into the middle years of consultant appointment.

The FRCA examiners are regularly observed and their performance is scrutinised, compared with that of colleagues and with the candidate’s results in other parts of the exam.

3. Student Preparation. The RCoA advises schools of anaesthesia that they should organise candidate training for the exams – including practice vivas and OSCEs. These teaching sessions usually involve the local consultants who are examiners.