SciELO - Scientific Electronic Library Online

vol.62 issue9Diagnostic accuracy of spot and timed measurements of urinary albumin concentration to determine microalbuminuria in sickle cell diseaseThe prevalence of sexual dysfunction among patients with end stage renal disease in Jamaica author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand



  • Have no cited articlesCited by SciELO

Related links

  • Have no similar articlesSimilars in SciELO


West Indian Medical Journal

Print version ISSN 0043-3144

West Indian med. j. vol.62 no.9 Mona Dec. 2013




Observer variability in sonographic measurement of kidney sizes among children in Benin City, Nigeria


Variabilidad del observador en las mediciones sonográficas del tamaño del riñón entre los niños en Benin City, Nigeria



CU EzeI; CU EzeII; TT MarchieIII; CC OhagwuIV; K OchieII

IDepartment of Radiation Biology, Radiotherapy, Radiodiagnosis and Radiography, Faculty of Clinical Sciences, College of Medicine, University of Lagos, Idi-Araba, Lagos, Nigeria
IIDepartment of Medical Radiography and Radiological Sciences, Faculty of Health Sciences and Technology, College of Medicine, University of Nigeria, Enugu campus, Nigeria
IIIDepartment of Radiology, University of Benin Teaching Hospital, Benin City, Edo state, Nigeria
IVDepartment of Medical Radiography, Nnamdi Azikiwe University, Nnewi campus, Nnewi, Anambra state, Nigeria





AIM: To assess the level of inconsistency in replicating sonographic kidney size measurements in a population of healthy Nigerian children.
SUBJECTS AND METHODS: In this prospective cross-sectional study, convenience sampling technique was used to select a sample of Nigerian children. Both consent from participants and ethical approval from the local authority were obtained before the study commenced. Three radiologists carried out the replicate sonographic measurements using a DP-1100 mechanical sector scanner with a 3.5 MHz convex probe. All examinations were done with subjects in the supine oblique position. Longitudinal and transverse scans were performed. Renal lengths and widths were measured from the longitudinal scans while thickness was measured from the transverse scans. Renal volumes were calculated with the ellipsoid formula. Analysis of variance, Student's t-test, Pearson's correlation coefficient and z-test were used to test the statistical significance of results. SPSS version 17.0 was used in the analysis of results while statistical significance of all results was tested at p < 0.05.
RESULTS: Mean intra-observer measurement errors in replicate sonographic measurements of kidney sizes ranged from 0.36-0.43 cm, 0.22-0.63 cm, 0.37-0.52 cm and 5.93-9.62 ml for kidney length, width, thickness and volume, respectively. Mean inter-observer measurement errors were in the range of 0.29 0.48 cm, 0.18-0.23 cm, 0.34-1.82 cm and 5.92-7.28 ml for length, width, thickness and volume, respectively. Mean intra-observer errors were not statistically significant (p > 0.05) but mean interobserver errors were (p < 0.05). Differences in all measurement errors of right and left kidney length, width, thickness and volume were not statistically significant (p > 0.05). Measurement errors correlated weakly with kidney sizes. Observer errors in renal length were not significantly different from what was reported among Caucasians (p > 0.05) whereas that of volume was (p < 0.05).
CONCLUSION: Errors in replicate sonographic kidney size measurements obtained by a single observer were less than errors in the same measurements by different observers; therefore, replicate sonographic measurements by a single observer were more consistent in this population.

Keywords: Children, Nigeria, observer variability, sonography


OBJETIVO: Evaluar el nivel de inconsistencia cuando se repiten las mediciones sonográficas del tamaño del riñón en una población de niños nigerianos saludables.
SUJETOS Y MÉTODOS: En este estudio prospectivo transversal, se utilizó la técnica de muestreo por conveniencia para seleccionar una muestra de niños nigerianos. Antes de comenzar el estudio, se obtuvo tanto el consentimiento de los participantes como la aprobación ética de la autoridad local. Tres radiólogos llevaron a cabo la repetición de las mediciones sonográficas mediante un escáner de sector mecánico DP-1100 con una sonda convexa de 3.5 MHz. Todos los exámenes se realizaron con sujetos en posición supina oblicua. Se realizaron escaneos longitudinales y transversales. La longitud y el ancho renales fueron medidos a partir de los escaneos longitudinales, mientras que el espesor se midió a partir de los escaneos transversales. Los volúmenes renales se calcularon con fórmula elipsoide. El análisis de varianza, la prueba t de Student, el coeficiente de correlación de Pearson, y la prueba Z, fueron utilizados para probar la significación estadística de los resultados. El programa SPSS versión 17.0 fue utilizado en el análisis de los resultados mientras que la significación estadística de los resultados fue probada en p < 0.05.
RESULTADOS: Los errores intraobservadores promedio en la repetición de las mediciones sonográficas de los tamaños de riñón variaron de 0.36-0.43 cm, 0.22-0.63 cm, 0.37-0.52 cm, 5.93-9.62 mL en cuanto a longitud, ancho, espesor y volumen, respectivamente. Los errores interobservadores promedio de medición estuvieron en un rango de 0.29-0.48 cm, 0.18-0.23 cm, 0.34-1.82 cm y 5.92-7.28 mL de longitud, ancho, espesor y volumen, respectivamente. Los errores intraobservadores promedio no fueron estadísticamente significativos (p > 0.05) pero los errores interobservadores fueron (p < 0.05). Las diferencias en todos los errores de medición del riñón derecho e izquierdo en relación con la longitud, ancho, espesor y volumen, no fueron estadísticamente significativos (p > 0.05). Los errores de medición guardaron una débil correlación con los tamaños del riñón. Los errores de observador en relación con la longitud renal no fueron significativamente diferentes de lo que se informó entre los caucásicos (p > 0.05), mientras que el resultado para el volumen fue (p < 0.05).
CONCLUSIÓN: Los errores en la repetición de las mediciones sonográficas del tamaño del riñón obtenidas por un solo observador, son menos que los errores en las mismas mediciones por diferentes observadores. Por lo tanto, las mediciones sonográficas repetidas por un único observador eran más consistentes entre esta población.

Palabras claves: Niños, Nigeria, variabilidad del observador, sonografía




Sonography is commonly used to evaluate the kidneys and urinary collecting systems (1). Replicate sonographic measurements of kidney sizes are essential in the clinical evaluation and follow-up of renal growth and outcome of treatment of renal diseases in children (2, 3). Renal length in particular is used to assess whether or not the kidneys are growing as expected. Normal renal growth, when unsatisfactory or totally lacking, raises a high suspicion of chronic renal disease such as pyelonephritis or vesicoureteral reflux (4, 5). Accurate sonographic renal size measurement, therefore, provides needed empirical evidence regarding the chronicity or otherwise of renal disorders and so may be relied on to guide clinical decisions such as whether to carry out surgery or not (6).

Sonography has many advantages over other imaging modalities. It is less expensive and does not involve ionizing radiation (7). However, consistency of replicate sonographic measurements is limited, causing such measurements to be inaccurate (8). Inconsistencies in sonographic measurements that often cause inaccuracy in measurements could be due to the state of the ultrasound machine, training and experience of the operator (the observer), scanning technique as well as the positioning of the patient.

Observer variability is the estimate of measurement error in replicate sonographic measurements attributable solely to the observer's judgements. It is the index of error in measurements obtained when a given measurement is replicated, either with the same or different observers (9). Intra-observer variability is the estimate of repeatability of a particular measurement by the same observer whereas interobserver variability is an estimate of the reproducibility of a given measurement by different observers (10). Observer variability (observer error) in replicate sonographic measurements could be estimated if a state of-the-art ultrasound scanner is used and observer(s) follow a single agreed scanning technique and patient positioning.

No literature exists on observer errors in sonographic measurement of kidney sizes in any Nigerian children population whereas studies in Europe, North America and Asia have assessed the consistency and reliability of sonographic measurements of kidney sizes in children by estimating observer measurement errors (4, 5, 11). Since sonography is usually not standardized beyond what is inherent in each department, it is therefore necessary to establish data suitable for each local department (1, 12).



This cross-sectional study was carried out between September and October 2011 at the University of Benin Teaching Hospital (UBTH), Benin City. Convenience sampling method was used to select subjects. Ethical approval was obtained from the local committee on ethics and research, while informed consent was obtained from both the head-teacher of UBTH Staff School, Ugbowo, and the participants' parents before the study began. The sample was drawn from among pupils and students in the age range of 1-17 years (13) from the UBTH staff schools. Four volunteers were excluded because of obesity (7).

Three consultant radiologists (referred to as observers 1, 2 and 3) with equal training as well as equal postcertification job and hands-on experience on the ultrasound scanner used for the study, carried out the sonographic measurements. These radiologists were certified both by the West African College of Surgeons (WACS) and the National Postgraduate Medical College of Nigeria (NPMCN). Sonographic examinations were performed with DP-1100, a high resolution, real time scanner manufactured in 2008 by Shenzhen Mindray Biomedical Electronic Co. Ltd, China, with a 3.5 MHz convex probe. An agreed patient positioning (supine oblique) was followed. All measurements were done with the on-screen electronic calliper of the ultrasound unit on kidney images captured using the unit's freeze frame capacity. Agreed scanning techniques (longitudinal and transverse) were followed. Well-defined kidney images that included both renal poles and which also clearly demonstrated the renal medulla and pyramids were captured at deep arrested inspiration (14, 15). Kidney length was measured from pole to pole from the longitudinal scan image while kidney width was measured at the widest AP diameter between the superior and inferior renal borders (Fig. 1a). Kidney thickness was measured in the transverse scan from the renal hilum to the pole at the level of the AP measurement (Fig. 1b). Kidney volume was calculated from the ellipsoid formula (7): length x width x thickness x 0.5233. Measurement error was calculated as the difference between pairs of measurements obtained either by the same observer (intraobserver errors) or by different observers (inter-observer errors).



Before the examination started, each examiner passed a near vision logarithm of minimum angle of resolution (LOGMAR) test to ensure that problem with eye sight was not a contributor to errors. All measurements were done at eye level and under ambient lighting to avoid parallax errors and errors due to poor illumination. Each observer measured all renal dimensions twice, with the second measurement taken after a 30-minute interval. During the interval, fresh subjects were examined so that observers would not be unduly influenced by their previous results, while every observer remained blinded to the measurements obtained by other observers to further reduce bias (1).

Statistical analysis was done using the Statistical Package for Social Sciences (SPSS) version 17.0. Analysis of variance (ANOVA) was used to calculate mean measurement errors. Student's t-test was used to compare measurement errors of the right and left kidneys. Pearson's correlation coefficient (r) was used to analyse association between measurement errors and kidney sizes, whereas twotailed z-tests at Z = 1.96 was used to compare mean measurement errors in the study with mean errors obtained by researchers who used a Caucasian children sample. Statistical significances of results were tested at p < 0.05.



A sample of 124 healthy Nigerian children (56 boys and 68 girls) of average age 10.2 years were sonographically examined. Intra-observer errors in the measurement of kidney sizes were 0.36-0.43 cm, 0.22-0.63 cm, 0.37-0.52 cm and 5.93-9.62 ml for kidney length, width, thickness and volume, respectively (Table 1). Inter-observer errors were in the range of 0.29-0.48 cm, 0.18-0.23 cm, 0.34-1.82 cm and 5.92-7.28 ml for kidney length, width, thickness and volume, respectively (Table 2). For all three observers, mean intra-observer errors for length, width, thickness and volume were not significant (p > 0.05; Table 3). Between observer groups, inter-observer errors for observer 1 vs 2, 2 vs 3 and 1 vs 3 were statistically significant for kidney width, thickness and volume (p < 0.05) whereas those of kidney length were not (p > 0.05; Table 3).

Intra-observer mean measurement errors of the right and left kidney sizes were not equal but the differences were not statistically significant (p > 0.05; Table 4). Intra-observer errors correlated positively with kidney length and volume (r = 0.45 and r = 0.30, respectively) but negatively for width and thickness (r = -0.01 and r = -0.01, respectively) whereas inter-observer errors correlated positively with all kidney sizes (Table 5). Measurement errors in renal length found in the study were not significantly different from errors found by observers among Caucasian subjects (p >0. 05) whereas those of volume were (p < 0.05; Table 6).




In spite of sonographic examination not involving ionizing radiation and being easily available, relatively less expensive and easy to perform, consistency of replicate measurement is one of the limitations of the modality (7). This study showed that observer errors in replicate sonographic measurements of kidney sizes by the same observer (intra-observer variability) were not statistically significant (p > 0.05). Interobserver variability, on the other hand, was statistically significant in the measurement of kidney width, thickness and volume for all observer groups (p < 0.05). These results suggest a better consistency and agreement between replicate measurements by the same observer. On the other hand, statistically significant inter-observer errors suggest less consistency between replicate sonographic measurements obtained by different observer groups. It is therefore plausible to suggest that the repeatability of sonographic kidney sizes appears to be easier by the same observer than the reproducibility of the same measurements by different observers. Measurement errors found in this study are slightly less than what have been reported in previous studies (3, 16, 17). Reasons for the noted differences may be connected with differences in patient positioning, scanning technique and training of the observers as well as peculiarities with the scanner itself. Similar studies, however, did suggest better agreement within than between replicate sonographic measurements with the authors explaining that whenever observers are faced with the same condition repeatedly, they tend to make the same or similar decisions (16, 18, 19).

Furthermore, significant observer errors found in the measurement of renal width and thicknesses as well as volume appear to suggest that such small renal dimensions seem apparently more difficult to measure sonographically. Moreover, the negative mean intra-observer measurement errors (Table 1) found in the study suggest that errors refer to actual rather than absolute values of differences, implying that universal applicability of sonographically measured kidney sizes may be limited and may only be suitable for the locality or ethnicity from where the sample was drawn (1, 12).

The study found that the difference in mean interobserver measurement errors of right and left kidney thickness and volume were statistically significant (p < 0.05) whereas intra-observer errors were not (p > 0.05). This underscores the difficulty many encounter in the sonographic measurement of small dimensions earlier stated and suggests that the sizes of both kidneys must be measured separately by the same observer, whether it is for routine studies or for investigation of renal pathology (7, 15, 20).

In our study, Pearson's coefficient of correlation showed that weak associations existed between observer errors and renal sizes (Table 5). Bland and Altman plotting of mean errors against renal sizes (Fig. 2a-d) shows that the relationship between measurement errors and kidney sizes is not perfectly linear. Rather, there was a clustering of measurement errors around small renal dimensions. This seems to corroborate the fact that small kidney dimensions such as width and thickness, and by extension, quite small kidneys (as may be found in chronic renal failure) may be more difficult to measure accurately using sonography. This supports an earlier result by other researchers that suggests that even very small but normal kidneys may be more difficult to be accurately measured sonographically (18).









In comparison, mean measurement errors in renal length in this study were not significantly different from those recorded from European population studies. Mean measurement errors in volume were, however, significant. This is not unexpected since the calculation of volume incorporates both renal width and thickness, dimensions which seem more difficult for many observers to measure accurately (18). Furthermore, observer errors appear to increase when multiple measurements are used to calculate a given dimension as is the case in the calculation of renal volume. It is also probable that the state of the ultrasound machines (their resolutions in particular) used in the different settings could have affected the measurements. The study, however, did not investigate if racial differences in kidney size reported by some authors played any role in the differences noted between our measurements and those among Caucasians (12, 21). Due to the dearth of medical physicists and basic quality assurance kits, quality assurance tests were not carried out on the ultrasound machine before the study commenced. This may have also affected the outcome of this study.



Replicate sonographic measurements of kidney sizes by a single observer are more consistent, reliable and more accurate than replicate measurements by different observers. Moreover, very small kidney dimensions appear to be more difficult to measure sonographically, so caution must be exercised before concluding that small measurements are absolute pointers to pathology. Furthermore, it is important to measure right and left kidneys separately during sonographic studies.

Based on our results and the literature reviewed, we recommend that:

* Replicate sonographic kidney size measurements, either for follow-up of kidney growth or assessment of pathology, should be undertaken by one experienced observer who must use a state-of-the-art ultrasound scanner, follow a single scanning technique and adopt a consistent patient position to reduce observer errors.

* Automation should be considered when absolute values are needed as a way to eliminate observer measurement errors associated with sonographically measured kidney size.

* In the absence of full automation, more accurate systems such as computed tomography (CT) and magnetic resonance imaging (MRI) may be preferable, especially in the measurement of kidney volume.



1. Larson DB, Meyers ML, O'Hara SM. Reliability of renal length measurements made with ultrasound compared with measurements from helical CT multiplanar reformat images. AM J Roentgenol 2011; 196: 592-7.

2. Ortiz-Neira C, Traubici J, Alan D, Moineddin R, Shuman C, Weksberg R et al. Sonographic assessment of renal growth in patients with Beckwith-Wiedemann syndrome: the Beckwith-Wiedemann syndrome renal nomogram. Clinics (Sao Paulo): 2008; 64: 41-4.

3. Schlesinger AE, Hernandez RJ, Zerin JM, Marks TM, Kelsch RC. Interobserver and intraobserver variation in sonographic renal length measurements in children. AJR 1991; 156: 1029-32.

4. Zerin JM, Blane CE. Sonographic assessment of renal length in children: a reappraisal. Pediatr Radiol 1994; 24: 101-6.

5. Leroy S, Chalumeau M, Ulinski T, Dubos F, Sergent-Alaoui A, Merzoug Vet al. Impressive renal damage after acute pyelonephritis in a child. Pediatr Nephrol 2010; 25: 1365-8.

6. Partik BL, Stadler A, Schamp S, Koller A, Voracek M, Heinz G et al. 3D versus 2D ultrasound: accuracy of volume measurement in human cadaver kidneys. Invest Radiol 2002; 37: 489-95.

7. Otiv A, Mehta K, Ali U, Nadkarni M. Sonographic measurement of renal size in normal Indian children. Indian Pediatr 2012; 49: 533-6. Epub 2012 Jan 17 [cited 2012 Feb 18]. Available from:

8. Carrico CW, Zerin JM. Sonographic measurement of renal length in children: does the position matter? Pediatr Radiol 1996; 26: 553-5.

9. Norie A, Ahmad R, Lien WF, Zahiah M. Inter-observer and intraobserver variability in the assessment of the paranasal sinuses radiographs. Internet J Otorhinolaryngol 2006; 5: 1-14.

10. Ejifugha AU. Fundamentals of Research in Health Education. Owerri: Africana FEP Publishers; 1998: 106.

11. Mancini M, Mainenti PP, SperanzaA, Liuzzi R, Soscia E, Sabbatini M et al. Accuracy of sonographic volume measurements of kidney transplant. J Clin Ultrasound 2006; 34: 184-9.

12. Bulcholz NP, Abbas F, Biyabani SR, Afzaal M, Javed Q. Ultrasonographic renal size in individuals without known renal disease. J Pak MedAssoc 2000; 50: 12-16.

13. Alemika EEO, Chukwuma I, Lafratta D, Messerli D, Souckova J. Rights of the child in Nigeria. Geneva: World Organization Against Torture; 2004: Available from:

14. Block B. The practice of ultrasound: a step-by-step guide to abdominal scanning. Stuttgart, Germany: Georg Thieme Verlag; 2004: 67-9.

15. American Institute of Ultrasound in Medicine. Practice guideline for the performance of an ultrasound of the abdomen and/or retro-peritoneum. Laurel, MD: AIUM; 2008 [revised 2012]. Available from:

16. Sargent MA, Long G, Karmali M, Cheng SM. Inter-observer variation in the sonographic estimation of renal volume in children. Pediatr Radiol 1997; 27: 663-6.

17. Sargent MA, Wilson BP. Observer variability in the sonographic measurement of renal length in children. Clin Radiol 1992; 46: 344-7.

18. EnglandA, NikerA, Redmond C. Variability of vascular CT measurement techniques used in the assessment of abdominal aortic aneurysms. Radiography 2010; 16: 173-81.

19. Miranda-Geelhoed JJ, Kleyburg-Linkers VE, Sonja PES, Lequin M, Nauta J, Steegers EAP et al. Reliability of renal ultrasound measurements in children. Pediatr Nephrol 2009; 24: 1345-53.

20. Emamian SA, Nielsen MB, Pedersen JF, Ytte L. Kidney dimensions at sonography: correlation with age, sex and habitus in 665 adult volunteers.AJR 1993; 160: 83-6.

21. Ukoha UU, Anibeze CIP, Akpuaka FU, Mgbor SO. Kidney parameters and age structure among southeast Nigerians. J Exp Clin Anat 2002; 1: 19-21.



Dr CU Eze
Department of Radiation Biology, Radiotherapy, Radiodiagnosis and Radiography, Faculty of Clinical Sciences, College of Medicine, University of Lagos
Idi-Araba, Lagos, Nigeria