/clinical/,/clinical/cckm-tools/,/clinical/cckm-tools/content/,/clinical/cckm-tools/content/questionnaires/,/clinical/cckm-tools/content/questionnaires/related/,

/clinical/cckm-tools/content/questionnaires/related/name-97150-en.cckm

201606168

page

100

UWHC,UWMF,

Tools,

Clinical Hub,UW Health Clinical Tool Search,UW Health Clinical Tool Search,Questionnaires,Related

Development and Validation of a Short-Form Rapid Estimate of Adult Literacy in Medicine

Development and Validation of a Short-Form Rapid Estimate of Adult Literacy in Medicine - Clinical Hub, UW Health Clinical Tool Search, UW Health Clinical Tool Search, Questionnaires, Related


ORIGINAL ARTICLE
Development and Validation of a Short-Form, Rapid
Estimate of Adult Literacy in Medicine
Ahsan M. Arozullah, MD, MPH,*† Paul R. Yarnold, PhD,‡ Charles L. Bennett, MD, PhD, MPP,*‡
Robert C. Soltysik, MS,* Michael S. Wolf, PhD, MPH,‡ Rosario M. Ferreira, MD, MPP,*‡
Shoou-Yih D. Lee, PhD,§ Stacey Costello, BA,* Adil Shakir, MBBS,* Caroline Denwood, BA,*
Fred B. Bryant, PhD,¶ and Terry Davis, PhD�
Background: Although prior studies used the 66-item Rapid Esti-
mate of Adult Literacy in Medicine (REALM instrument) for
literacy assessment, researchers may require a shorter, validated
instrument when designing interventions for clinical contexts.
Objective: To develop and validate a very brief literacy assessment
tool, the REALM-Short Form (REALM-SF).
Patients: The model development, validation, and field testing
validation samples included 1336, 164, and 50 patients, respectively.
Setting: General medicine and subspecialty clinics and medicine
inpatient wards.
Design: For development and validation samples, indicator vari-
ables for REALM instrument items were evaluated as potential
predictors of REALM instrument score by stepwise multiple regres-
sion analysis with subsequent bootstrap and confirmatory factor
analysis of selected items. Pearson correlations compared
REALM-SF and REALM instrument scores and kappa analyses
compared grade level assignments. For the field testing validation
sample, Pearson correlations compared Wide Range Achievement
Test and REALM-SF scores.
Results: The REALM-SF included 7 items with stable model
coefficients and 1 underlying linear factor. REALM-SF and
REALM instrument scores were highly correlated in development
(r� 0.95, P� 0.001) and validation (r� 0.94, P� 0.001) samples.
There was excellent agreement between REALM-SF and REALM
instrument grade-level assignments when dichotomized at the 6th
grade (development: 97% agreement, K � 0.88, P � 0.001; vali-
dation: 88% agreement, K � 0.75, P � 0.001) and 8th grade levels
(development: 94% agreement, K � 0.78, P � 0.001; validation:
84% agreement, K � 0.67, P � 0.001). REALM-SF and Wide
Range Achievement Test scores were highly correlated (r � 0.83,
P � 0.001) in field testing validation.
Conclusions: The REALM-SF provides researchers a brief, vali-
dated instrument for assessing patient literacy in diverse research
settings.
Key Words: literacy, survey research, underserved population,
socioeconomic factors
(Med Care 2007;45: 1026–1033)
A
ccording to an Institute of Medicine report, up to 90
million American adults have difficulty in understanding
and acting upon health information because of limited health
literacy.
1
The Rapid Estimate of Adult Literacy in Medicine
(REALM instrument)
2,3
is a common literacy assessment
instrument used in research studies.
4
Prior studies, using the
REALM instrument, evaluated the association between low
literacy skills and diverse clinical outcomes
5
including
prostate cancer stage at presentation,
6
cancer screening
knowledge
7,8
and behavior,
9
pregnancy-related risks
10
and
behavior,
11
medication knowledge,
12,13
and health care
utilization.
14–17
As health literacy research moves increasingly towards
developing and testing interventions, there is a growing need
for shorter, validated literacy assessment instruments appli-
cable in real-life clinical contexts. These instruments must
simultaneously be valid from a scientific perspective and
practical for use from a clinical perspective. Although trained
personnel can administer the REALM instrument in an aver-
From the *Jesse Brown Veterans Affairs Medical Center, Chicago, Illinois,
and the VA Center for the Management of Complex Chronic Care
(CMC3), Hines, Illinois; †Sections of General Internal Medicine and
Health Promotion Research of the Department of Medicine, University of
Illinois at Chicago, Chicago, Illinois; ‡Divisions of Emergency Medi-
cine, Hematology/Oncology, General Internal Medicine, and Gastroen-
terology of the Department of Medicine, Feinberg Medical School of
Northwestern University, Chicago, Illinois; §Department of Health Pol-
icy Administration, University of North Carolina, Chapel Hill, North
Carolina; ¶Department of Psychology, Loyola University, Chicago, Illi-
nois; and �Departments of Medicine and Pediatrics, Louisiana State
University Health Sciences Center, Shreveport, Louisiana.
Drs Arozullah and Ferreira were supported as Research Associates in the
Career Development Award Program of the Veterans Affairs Health
Services Research and Development Service. This project was supported
through a Pfizer Health Literacy Grant. Dr. Lee’s participation was
partially supported by a research grant from the Agency for Healthcare
Research and Quality (R01 HS13004).
The sponsors (VA, Pfizer, AHRQ) had no role in the design and conduct of
the study; collection, management, analysis, and interpretation of the
data; and preparation, review, or approval of the manuscript.
Preliminary results of this manuscript were presented as an oral abstract
presentation at the American Society of Clinical Oncology 2004 Annual
Meeting held in New Orleans, Louisiana.
Reprints: Ahsan M. Arozullah, MD, MPH, Research and Development,
Room 6203, Jesse Brown VA Medical Center (151), 820 S. Damen
Avenue, Chicago, IL 60612. E-mail: arozulla@uic.edu.
Copyright © 2007 by Lippincott Williams & Wilkins
ISSN: 0025-7079/07/4511-1026
Medical Care ? Volume 45, Number 11, November 20071026

age of 2–3 minutes,
4
this time may be prohibitive for inter-
ventional studies in settings such as busy outpatient clinics.
Developing shortened instruments may also facilitate literacy
assessment in studies where literacy is not the primary focus
by providing researchers an efficient method of assessing the
potential confounding and/or mediating effects of low liter-
acy in their study populations.
Prior studies have suggested the feasibility of shorten-
ing the REALM instrument.
18,19
One prior study described
the development and pilot testing of a shortened version of
the REALM instrument, the 8-word REALM-R.
19
However,
the study sample was relatively well educated and included
few minority and elderly patients. Therefore, the validity of
using the REALM-R in populations with high rates of limited
literacy is not known.
The purpose of this study was to develop and validate
a very brief literacy screening instrument, derived from the
REALM instrument, for use in future research studies. The
shortened form instrument (REALM-SF) was developed and
tested in study cohorts from sites that represent a variety of
inpatient and outpatient settings in which current and future
literacy studies will likely occur. Furthermore, these study
cohorts included minority and elderly patients because lim-
ited literacy skills are common among these populations.
18
METHODS
Study Cohorts and Data Collection
Patients who received care at 1 Veterans Affairs (VA)
hospital, 2 private university-affiliated hospitals, and 1 public
university-affiliated hospital were included in this study.
Table 1 describes the 9 nonoverlapping patient cohorts that
were included: (1) 6 cohorts for model development (cohorts
1–6); (2) 2 cohorts for model validation (cohorts 7–8); and
(3) 1 cohort for field testing validation (cohort 9). Further
details about specific enrollment criteria have been published
previously for the following cohorts: cohort 1,
17
cohort 2,
9
and cohorts 4 and 6.
12
In the model development and validation cohorts (co-
horts 1–8), the REALM instrument was administered by
trained interviewers to patients who received care at general
internal medicine, human immunodeficiency virus, hematol-
ogy/oncology, and urology clinics as well as inpatients from
general medicine wards of the study hospitals. In the field
testing validation cohort (cohort 9), the REALM-SF and the
Wide Range Achievement Test, revised version 3 (WRAT-R)
were administered by trained interviewers to patients who
received care at the general internal medicine clinic at the
Jesse Brown VA Medical Center. Dictionary pronunciation
formed the basis for the scoring standard in each cohort. The
Institutional Review Board of each hospital approved each
study before initiation. Information about the study design
was provided to patients in both oral and written forms.
Written informed consent was required before administration
of literacy tests.
REALM-SF Development and Validation
Stepwise multiple regression modeling was used to
determine the independent association between individual
REALM instrument items and total REALM instrument score
among the development sample patients (cohorts 1–6). Indi-
cator variables, coded as “1” if pronounced correctly and “0”
TABLE 1. Cohort Descriptions
Cohort No. Use in Study Interview Setting Sample Description N
1 Model development Medicine Inpatient Wards (Jesse Brown VAMC) Convenience sample of inpatients
approached during hospitalization
455
2 Model development GMC (VA Chicago Healthcare System) Convenience sample of outpatients
needing colorectal cancer
screening approached during
clinic visit
370
3 Model development GMC (University of Chicago Hospital) Convenience sample of outpatients
approached during clinic visit
275
4 Model development HIV Outpatient Clinic (Northwestern Memorial
Hospital)
Convenience sample of outpatients
receiving antiretroviral therapy
approached during clinic visit
91
5 Model development Oncology and Urology Clinics (VA Chicago
Healthcare System)
Convenience sample of outpatients
with prostate cancer approached
during clinic visit
70
6 Model development HIV Outpatient Clinic (LSU Health Sciences
Center)
Convenience sample of outpatients
receiving antiretroviral therapy
approached during clinic visit
75
7 Model validation GMC (VA Chicago Healthcare System) Convenience sample of outpatients
approached during clinic visit
75
8 Model validation Oncology Clinics (LSU Health Sciences Center and
VA Chicago Healthcare System)
Convenience sample of outpatients
with colorectal or prostate cancer
approached during clinic visit
89
9 Field testing validation GMC (Jesse Brown VAMC) Convenience sample of outpatients
approached during clinic visit
50
N indicates number of patients; VAMC, Veteran Affairs Medical Center; GMC, General Medicine Clinic; LSU, Louisiana State University.
Medical Care ? Volume 45, Number 11, November 2007 REALM-SF
? 2007 Lippincott Williams & Wilkins 1027

if incorrect, were created for each REALM instrument item.
Each indicator variable was entered into the model as a
potential independent variable with REALM instrument score
as the dependent variable.
20
Stepwise multiple regression
modeling was used to select REALM-SF items with a priori
item retention criteria of experimentwise P� 0.05 and partial
R
2
� 0.01, such that each retained item explained at least 1%
of the variation in REALM instrument scores. Bootstrap
analysis of 1000 iterations with 50% resampling was con-
ducted to assess model over-fitting and potential cross-gen-
eralizability.
Confirmatory factor analysis via LISREL 8
21
was used
to test the hypothesis that the 1-factor measurement model
provided a reasonable goodness-of fit for the set of 7 dichot-
omous variables. Following the procedures outlined by Jo-
reskog and Sorbom,
21
PRELIS 2
22
was first used to produce
tetrachoric intercorrelations for the set of dichotomous mea-
sures, along with their asymptotic variances and covariances.
The matrix of tetrachoric correlations and the asymptotic
covariance matrix were then used as data input for LISREL 8
to obtain the 1-factor solution, standard errors, squared mul-
tiple correlations for measured variables, and associated mea-
sures of goodness-of-fit. The variance of the single factor was
fixed at 1.0 to define the metric of its variance units and
identify the model, and the unique error variances of the
measured variables were specified to be uncorrelated with
one another.
23
REALM-SF total score was defined as the number of
words correctly pronounced (range, 0–7). Pearson correla-
tions between REALM-SF and REALM instrument scores
were computed for the development sample (cohorts 1–6),
for each validation cohort separately (cohorts 7 and 8), and
for both validation cohorts combined. The presence of
Simpson’s paradoxical confounding was evaluated by ex-
amining the Pearson correlations between REALM-SF and
REALM instrument scores, separately by study cohort,
gender, and race.
24
REALM instrument scores were categorized into 1 of 4
previously validated reading grade levels: 3rd grade or less
(0–18), 4th to 6th grade (19–44), 7th to 8th grade (45–60),
and 9th grade or higher (61–66).
3
Based on distributional
similarities, the REALM-SF scores were mapped to these
4 reading grade levels: 3rd grade or less (0), 4th to 6th
grade (1–3), 7th to 8th grade (4–6), and 9th grade or
higher (7). Agreement between REALM-SF and REALM
instrument reading grade level assignment was assessed by
kappa analysis.
REALM-SF Field Testing Validation
Fifty patients from the general internal medicine out-
patient clinic at the Jesse Brown VA Medical Center were
recruited for field testing validation of the REALM-SF.
Patients with blindness or severely impaired vision not cor-
rectable with eyeglasses, or deafness or hearing problems not
correctable with a hearing aid were excluded because the
literacy testing involved visual cues and required the ability
to follow oral directions. Eligible patients were approached
during their clinic visit and informed consent was obtained
using verbal explanations and information sheets.
The REALM instrument imitated the format of the
WRAT-R,
25
a word and letter recognition reading test, by
substituting medical terms in place of nonmedical terms. The
REALM instrument was validated against the WRAT-R, and
similarly, we compared results of the REALM-SF with the
WRAT-R during field testing validation. Each enrolled sub-
ject was randomly assigned to receive either the REALM-SF,
followed by the WRAT-R (n � 25), or the WRAT-R,
followed by the REALM-SF (n � 25). Trained research
assistants administered these tests, and then queried each
subject about whether the REALM-SF made them uncom-
fortable in any way; their willingness to complete a similar
test if requested by their personal doctor; their comfort
level with sharing literacy test results with their doctor;
whether the test took too long; and any other comments.
Pearson correlation between REALM-SF and WRAT-R
scores was assessed and then compared with reported correla-
tions between WRAT-R scores and the following tests: (1)
REALM instrument; (2) the Test of Functional Health Literacy
in Adults (TOFHLA),
26
a reading comprehension test; and (3)
the REALM-Revised (REALM-R),
19
a shortened word recog-
nition test derived from the REALM instrument.
RESULTS
The model development sample consisted of 1336
patients who received health care as outpatients in general
internal medicine (n � 645), in medical subspecialty clinics
(n � 236), and as general medicine inpatients (n � 455)
(Table 1). Overall, the mean age of the development sample
was 61 years (SD: 14 years); 81% were male; 62% were
African American; and 43% had less than 9th grade literacy
(Table 2). For the development sample, REALM instrument
scores ranged from 0 to 66 with a mean score of 56 (SD: 14)
and REALM-SF scores ranged from 0 to 7 with a mean score
of 5.8 (SD: 1.9) (Table 3).
The model validation sample (n � 164) consisted of 75
patients from a general internal medicine clinic and 89
patients from a medical subspecialty clinic. The mean age of
the validation sample was 68 years (SD: 10 years); 100%
were male; 98% were African American; and 64% had less
than 9th grade literacy (Table 2). For the validation sample,
REALM instrument scores ranged from 6 to 66 with a mean
score of 53 (SD: 15) and REALM-SF scores ranged from 0 to
7 with a mean score of 5.4 (SD: 2.0).
Seven items from the REALM instrument met a priori
retention criteria of P � 0.05 and partial R
2
� 0.01 in
stepwise regression analysis and were retained as the
REALM-SF (Table 4). Cumulative model R
2
indicated that
92.1% of the variance in REALM instrument scores was
accounted for by the 7 REALM-SF items. Bootstrap validity
analysis revealed narrow confidence intervals for model pa-
rameters implying model stability (Table 4).
Results for the 1-factor confirmatory factor analysis
model strongly support the conclusion that the set of 7
dichotomous variables represents a unidimensional scale. In
particular, the goodness-of-fit �
2
value for the 1-factor model
was nonsignificant when corrected for nonnormality, �
2
(14,
N � 1336) � 14.54, P � 0.41, as was the Satorra-Bentler
Arozullah et al Medical Care ? Volume 45, Number 11, November 2007
? 2007 Lippincott Williams & Wilkins1028

scaled �
2
,
27

2
(14, N � 1336) � 17.60, P � 0.23—both of
which indicate that the 1-factor model provides an excellent
fit to the data. The root mean square residual
28
for the 1-factor
model was only 0.014, indicating good fit when adjusting for
model complexity. The goodness-of-fit index
21
was 0.91, also
indicating reasonable model fit. In addition, the compara-
tive fit index
29
was 0.97, and the Tucker-Lewis coeffi-
cient
30
was 0.96, further supporting the goodness-of-fit of
the 1-factor model. All factor loadings were highly statis-
tically significant (P � 0.001); and the squared multiple
correlations for the measured variables ranged from 0.70 to
0.95 (median R
2
� 0.80), indicating that the single global
factor explained a large proportion of the variance in each
of the measured variables. Considered together, this evi-
dence clearly indicates that responses to the set of 7
dichotomous variables are best represented in terms of a
1-factor measurement model.
The Pearson correlation between REALM instrument
score and the 7 REALM-SF items, using the regression
model coefficients, was r � 0.959. Analogous to REALM
instrument scoring (range, 0–66), REALM-SF score was
defined as the number of words correctly pronounced (range,
0–7). REALM-SF scores correlated nearly perfectly with the
regression model scores, r � 0.992. The Pearson correlations
between REALM-SF and REALM instrument scores were
r � 0.951 in the development sample (n � 1336), r � 0.940
for the combined validation sample (n � 164), r � 0.953 for
cohort 7 (n � 75), and r � 0.933 for cohort 8 (n � 89).
There was excellent agreement between REALM in-
strument and REALM-SF reading grade level assignment
with 84% agreement (K � 0.73, P � 0.001) in the develop-
ment sample and 77% agreement (K � 0.64, P � 0.001) in
the validation sample (Table 5). There was also excellent
agreement between REALM instrument and REALM-SF
TABLE 2. Demographic Characteristics and Literacy Level by Cohort
Characteristic
Cohort 1
(n � 455)
Cohort 2
(n � 370)
Cohort 3
(n � 275)
Cohort 4
(n � 91)
Cohort 5
(n � 70)
Cohort 6
(n � 75)
Cohort 7
(n � 75)
Cohort 8
(n � 89)
Age (yr)* 60 � 13 67 � 10 66 � 10 42 �870� 11 35 � 11 67 � 11 68 � 10
Gender (%)
Male 98.9 100.0 29.8 83.5 100.0 53.3 100.0 100.0
Female 1.1 0.0 70.2 16.5 0.0 46.7 0.0 0.0
Race/Ethnicity (%)
White 11.9 48.4 32.5 58.2 47.1 41.3 0.0 3.4
African American 83.3 45.3 61.7 33.0 48.6 54.7 100.0 96.6
Other 4.8 6.3 5.8 8.8 4.3 4.0 0.0 0.0
Literacy level (%)
�3rd grade 8.4 3.3 1.5 0.0 2.9 13.2 8.0 6.7
4–6th grade 15.4 6.0 7.3 2.2 5.7 7.9 8.0 11.2
7–8th grade 33.9 24.6 23.4 13.0 25.7 15.8 46.7 47.2
�9th grade 42.3 66.1 67.9 84.8 65.7 63.2 37.3 34.8
*Age reported as mean years � SD.
TABLE 3. Percent Correct Responses on Individual REALM-SF Items by Cohort
Item
Cohort 1
(n � 455)
Cohort 2
(n � 370)
Cohort 3
(n � 275)
Cohort 4
(n � 91)
Cohort 5
(n � 70)
Cohort 6
(n � 75)
Overall
(n � 1336)
Fat 97.8 98.4 98.5 100.0 100.0 97.3 98.4
Flu 96.0 98.4 98.2 100.0 97.1 94.7 97.4
Behavior 93.0 95.4 96.7 96.7 95.7 92.0 94.8
Exercise 90.5 95.1 95.3 95.6 95.7 88.0 93.3
Menopause 83.1 89.7 94.2 94.5 90.0 84.0 88.4
Rectal 73.6 90.2 100.0 94.5 84.3 77.3 83.7
Antibiotics 71.9 83.4 88.0 90.1 87.1 86.7 81.3
Anemia 54.7 74.5 80.7 89.0 68.6 73.3 69.6
Jaundice 57.1 72.8 78.9 76.9 67.1 65.3 68.3
REALM instrument (Mean) 52.0 57.4 59.0 60.8 56.3 54.8 55.9
REALM instrument (SD) 16.3 13.3 11.6 8.8 13.5 17.2 14.4
REALM-SF Mean 5.2 6.0 6.2 6.4 5.9 5.7 5.8
REALM-SF (SD) 2.1 1.7 1.5 1.3 1.8 2.1 1.9
Values given are the percentage of patients who pronounced each individual REALM instrument item correctly. The REALM instrument mean and REALM instrument SD are
the corresponding mean REALM instrument total score (0–66) and SD, respectively. The REALM-SF mean and REALM-SF SD are the corresponding mean REALM-SF total score
(0–7) and SD, respectively.
Medical Care ? Volume 45, Number 11, November 2007 REALM-SF
? 2007 Lippincott Williams & Wilkins 1029

assignment when scores were dichotomized at the 6th grade
(development: 97% agreement, K � 0.88, P � 0.001; vali-
dation: 88% agreement, K � 0.75, P � 0.001) and 8th grade
levels (development: 94% agreement, K � 0.78, P � 0.001;
validation: 84% agreement, K � 0.67, P � 0.001).
To evaluate the possibility of paradoxical confound-
ing that may arise when data from multiple cohorts are
combined (Simpson’s paradox), Pearson correlations be-
tween REALM instrument and REALM-SF scores were com-
puted separately by study cohort, gender, and race/ethnic-
ity.
24
The Pearson correlations, by study cohort, ranged from
r � 0.93 to 0.97, compared with r � 0.95 for the total
development sample, suggesting that no paradoxical con-
founding existed. There was no significant difference in the
correlation between REALM-SF and REALM instrument
scores by race (r � 0.92 among whites, r � 0.95 among
African Americans) or gender (r � 0.95 among males, r �
0.95 among females).
The field testing validation sample consisted of 50
patients from a General Internal Medicine clinic with a mean
age of 61 years; 100% were male and 74% were African
American. The Pearson correlation between REALM-SF and
WRAT-R scores in this sample was strong and highly statis-
tically significant (r � 0.83, P � 0.0001). This correlation
was not significantly different than the reported correlation
between REALM instrument and WRAT-R scores (r � 0.83
vs. 0.88, P � 0.13),
3
but was marginally higher than the
reported correlation between TOFHLA and WRAT-R scores
(r � 0.83 vs. 0.74, P � 0.07).
26
The correlation between
REALM-SF and WRAT-R was significantly higher than the
reported correlation between REALM-R and WRAT-R
scores (r � 0.83 vs. 0.64, P � 0.001).
19
In the field testing validation sample, REALM-SF score
was not significantly correlated with age (r ��0.20, P �
0.16), whereas WRAT-R score had a statistically marginal
negative correlation with age (r ��0.25, P � 0.08). The
correlation between REALM-SF and WRAT-R was not sig-
nificantly different when REALM-SF was administered first
or second (r � 0.88 vs. 0.78, P � 0.27). Finally, the
correlation between REALM-SF and WRAT-R was consis-
tent with the result for the combined sample when considered
only for African American subjects (n � 36, r � 0.87, P �
0.0001).
Patients in the field testing validation sample were also
queried about their subjective impressions of the REALM-SF.
We found that 96% of subjects reported that the REALM-SF
was not offensive, 98% reported that the test did not take too
long, 96% were willing to take the REALM-SF if recom-
mended by their personal doctor, and 90% were willing to
share their results with their doctor.
DISCUSSION
The availability of shorter literacy assessments may
increase the likelihood that researchers will include literacy
assessment in their studies, even when literacy is not the
TABLE 4. The Stepwise Regression Model and Bootstrap Validity
Estimates
Item
Training Analysis Bootstrap Validity Analysis
� Partial R
2
� (95% CI) Partial R
2
(95% CI)
Menopause 0.214 0.606 0.217 (0.215–0219) 0.611 (0.608–0.613)
Antibiotics 0.190 0.133 0.188 (0.187–0.190) 0.131 (0.129–0.132)
Exercise 0.179 0.075 0.178 (0.177–0.180) 0.074 (0.073–0.075)
Jaundice 0.174 0.048 0.173 (0.172–0.175) 0.048 (0.047–0.049)
Rectal 0.176 0.026 0.175 (0.174–0.176) 0.026 (0.025–0.027)
Anemia 0.174 0.018 0.176 (0.174–0.177) 0.018 (0.018–0.019)
Behavior 0.173 0.014 0.172 (0.170–0.173) 0.014 (0.013–0.015)
Model training and bootstrap validity analyses (1000 iterations, 50% resample) were conducted using
data from all patients in cohorts 1 through 6 (n � 1336). REALM instrument items were normatively
standardized into z-score form before analysis to eliminate the intercept (zero) and simplify bootstrap
analysis. Items are presented in the order selected by the analysis algorithm: for all items, P � 0.0001. �
is the regression model. � coefficient for the item, and 95% CI are Tchebysheff (nonparametric) 95% CI
based on bootstrap estimates of mean and SD.
TABLE 5. Reading Grade Level Assignment Using
REALM Instrument Scores Versus REALM-SF Scores
Development Sample
(N � 1336)
Validation Sample
(N � 164)
REALM
Instrument
N (%)
REALM-SF
N (%)
REALM
Instrument
N (%)
REALM-SF
N (%)
Reading grade
level
�3rd grade 58 (4.3) 51 (3.8) 12 (7.3) 10 (6.1)
4–6th grade 137 (10.3) 121 (9.0) 16 (9.8) 16 (9.8)
7–8th grade 376 (28.1) 431 (32.3) 77 (46.9) 65 (39.6)
�9th grade 765 (57.3) 733 (54.9) 59 (36.0) 73 (44.5)
REALM instrument scores were categorized as third grade or less (0–18), fourth to
sixth grade (19–44), seventh to eighth grade (45–60), and ninth grade or higher
(61–66) (Davis et al., 1993).
3
Based on distributional similarities, REALM-SF scores
were categorized as third grade or less (0), fourth to sixth grade (1–3), seventh to eighth
grade (4–6), and ninth grade or higher (7). Agreement between REALM instrument and
REALM-SF reading grade level assignment in the development sample was 84% (K �
0.73, P � 0.0001) and 77% (K � 0.64, P � 0.0001) in the validation sample. When
dichotomized at the sixth grade level, agreement was 97% (K� 0.88, P� 0.001) in the
development and 88% (K � 0.75, P � 0.001) in the validation sample. When
dichotomized at the eighth grade level, agreement was 94% (K � 0.78, P � 0.001) in
the development and 84% (K � 0.67, P � 0.001) in the validation sample.
Arozullah et al Medical Care ? Volume 45, Number 11, November 2007
? 2007 Lippincott Williams & Wilkins1030

primary research focus. Furthermore, there is a growing need
for shorter, validated literacy assessment instruments to fa-
cilitate interventional research that requires applicability to
real-life clinical contexts. The challenge is that these instru-
ments must simultaneously be scientifically valid and practi-
cal for use in clinical contexts. Although the REALM instru-
ment takes 2–3 minutes to administer,
4
this time may be
prohibitively long for interventions designed for busy outpa-
tient clinics.
Using interviewer-administered data collected from
multiple study sites, a 7-word instrument, the REALM-SF,
was developed and validated. REALM-SF scores were highly
correlated with REALM instrument scores in both the devel-
opment and validation samples (r � 0.95 and r � 0.94,
respectively) and were highly correlated with WRAT-R
scores in field validation testing (r � 0.83). There was no
evidence of paradoxical confounding by study cohort, gender,
or race/ethnicity. Furthermore, there was excellent agreement
between REALM instrument and REALM-SF reading grade
level assignments, even when dichotomized at the 6th or 8th
grade reading level.
One prior study described the development and pilot
testing of a shortened version of the REALM instrument, the
REALM-R.
19
In that study, 8 items having an item-whole
correlation greater than 0.40 and close to 50% correct pro-
nunciation rates were selected from the REALM instrument
results of 50 patients.
19
This initial instrument was further
refined by removing words with a sexual overtone and was
reviewed by 3 primary care physicians for face validity and
applicability.
19
The final version of the REALM-R included 8
words along with 3 nonscored filler words that were added to
enhance patient confidence and relieve potential anxiety. In
pilot testing on 157 patients, the REALM-R had a Spearman
rank correlation of 0.64 with the WRAT-R and took less than
2 minutes to administer.
19
Although these preliminary results
supported the feasibility of developing a shortened version of
the REALM instrument, there were significant limitations to
the REALM-R. First, the convenience sample used in the
study was relatively well educated and included few minority
and elderly patients. Therefore, the validity of using the
REALM-R in populations with high rates of limited literacy
is not known. Second, the selection of items for the
REALM-R used expert judgment that may not be easily
reproduced and may not be relevant in other practice settings
or in other geographical regions.
One major advantage of the REALM-SF is that it had
significantly higher correlation with WRAT-R scores in val-
idation field testing compared with the REALM-R (r � 0.83
vs. 0.64, P� 0.001).
19
The REALM-SF was developed using
a patient sample in which 43% of patients had less than 9th
grade literacy, 45% were 65 years or older, and two-thirds
(67%) were from racial/ethnic minorities. There was no
evidence of paradoxical confounding of REALM-SF scores
by gender or race/ethnicity. Furthermore, the REALM-SF
was validated in an independent patient sample in which 98%
of patients were African American and 64% had less than 9th
grade literacy. A second advantage of the REALM-SF is that
patients were interviewed in a variety of practice settings
from 2 major geographical areas; the Midwest and the South.
A third advantage is that item selection for the REALM-SF
was performed independent of expert judgment by methods
that can be easily reproduced.
Another prior study demonstrated that 19 different
strategies for shortening the REALM instrument resulted in
stable reliability coefficients (Cronbach’s alpha �0.80), sug-
gesting that the REALM instrument could be shortened while
maintaining internal consistency.
18
However, correlation co-
efficients (r) for these shortened REALMs were not reported,
so that the validity of using these shortening strategies could
not be evaluated.
18
This study also reported racial variation in
REALM instrument item scoring, independent of education
status. The REALM-SF builds upon these previous findings
in 2 important ways. First, we developed a reduced REALM,
the REALM-SF that was highly correlated with REALM
instrument scores in development and validation cohorts (r�
0.95 and 0.94, respectively) while maintaining excellent in-
ternal consistency. Second, our findings support the future
use of the REALM-SF in populations that include African
Americans. REALM-SF items did not demonstrate signifi-
cant differences in performance by race and only 1 item
(jaundice) was previously reported as having differential
performance by race.
18
Furthermore, we found no significant
difference in the correlation between REALM-SF and
REALM instrument scores by race (r � 0.92 for whites, r �
0.95 for African Americans).
Some limitations of this study should be considered.
Our study cohorts did not include younger patients from
nonuniversity-affiliated institutions. Patient factors such as
limited eyesight, hearing impairment, acute illness, and cog-
nitive function may affect the accuracy and emotional re-
sponses to literacy assessment.
31
Patients might also be
highly literate in their native language, but literacy assess-
ment in English only may misclassify them as having limited
literacy skills. Our samples included few non-African Amer-
ican minorities, especially Latinos, limiting our ability to
determine the magnitude of this potential misclassification.
Word recognition tests such as the REALM-SF may be
limited in their ability to be translated into Spanish. Prior
efforts to develop a Spanish version of the REALM instru-
ment failed because of the close phenome–grapheme corre-
spondence of the spelling system in Spanish,
32
although some
of these difficulties may be overcome by adding a compre-
hension component.
33
The generalizability of our findings
may also be limited because our samples included relatively
few women with low literacy skills. The REALM-SF will
need further validation among women with lower literacy
skills, because gender may differentially affect recognition of
words such as “menopause” and “anemia.”
The REALM-SF, similar to other word recognition
tests, does not formally assess writing, numeracy, or compre-
hension skills. Although word recognition, writing, nu-
meracy, and comprehension skills may be highly correlated,
the measurement error of using word recognition tests alone
is unknown with regard to health literacy applications. Re-
searchers should also be aware that literacy levels may affect
the quality of information elicited as well as patient confi-
Medical Care ? Volume 45, Number 11, November 2007 REALM-SF
? 2007 Lippincott Williams & Wilkins 1031

dence in communicating their concerns.
34
Formal training
about these patient-level factors and potential adverse patient
reactions should be completed by research personnel before
initiating literacy assessment.
Literacy assessment can have negative effects. One
prior study reported that 40% of patients with poor reading
skills felt shame.
35
Nearly two-thirds of these patients never
told their spouses and over half never told their children about
their reading difficulties.
35
We formally field tested the
REALM-SF as an instrument independent of the REALM
instrument, to assess the acceptability of pronouncing 7
words for patients with limited literacy skills. We found that
nearly all subjects (96%) did not find any words offensive and
were willing to complete a similar test if requested by their
personal doctor. We attempted to minimize the potential
shame associated with literacy assessment by adding “fat”
and “flu” to the beginning of the REALM-SF.
19
The presence
of these words seemed to enhance confidence and reduce
shame and anxiety for patients with limited literacy skills
without significantly increasing administration time. Further-
more, 92% of subjects felt comfortable if the results of their
literacy assessment were shared with their doctor. Appendix
1 provides suggested directions for administering the
REALM-SF along with the scoring guide and word ordering
used in field testing.
In conclusion, the REALM-SF offers researchers a
simplified, validated, and efficient instrument for assessing
patient literacy in diverse clinical and public health set-
tings. The REALM-SF may be used to investigate the rela-
tionship between literacy and specific health outcomes such
as medication compliance and health care utilization. The
REALM-SF may be particularly useful for researchers de-
signing interventions for clinical contexts.
REFERENCES
1. Nielsen-Bohlman L, Panzer A, Kindig D. Health Literacy: A Prescrip-
tion to End Confusion. Washington, DC: Institute of Medicine of the
National Academies; 2003.
2. Davis TC, Crouch MA, Long SW, et al. Rapid assessment of literacy
levels of adult primary care patients. Fam Med. 1991;23:433–435.
3. Davis TC, Long S, Jackson R, et al. Rapid estimate of adult literacy in
medicine: a shortened screening instrument. Fam Med. 1993;25:391–
395.
4. Davis T, Kennen E, Gazmararian J. Literacy testing in health care
research. In: Schwartzberg J, VanGeest J, Wang C, eds. Understanding
Health Literacy: Implications for Medicine and Public Health. Chicago,
IL: AMA Press; 2005:157–179.
5. DeWalt D, Berkman N, Sheridan D, et al. Literacy and health outcomes:
a systematic review of the literature. J Gen Intern Med. 2004;19:1228–
1239.
6. Bennett C, Ferreira M, Davis T, et al. Relation between literacy, race,
and stage of presentation among low-income patients with prostate
cancer. J Clin Oncol. 1998;16:3101–3104.
7. Lindau S, Tomori C, Lyons T, et al. The association of health literacy
with cervical cancer prevention knowledge and health behaviors in a
multiethnic cohort of women. Am J Obstet Gynecol. 2002;186:938–943.
8. Davis TC, Arnold C, Berkel H, et al. Knowledge and attitude on
screening mammography among low-literate, low-income women. Can-
cer. 1996;78:1912–1920.
9. Ferreira M, Dolan N, Fitzgibbon M, et al. Health care provider-directed
intervention to increase colorectal cancer screening among veterans:
results of a randomized controlled trial. J Clin Oncol. 2005;23:1548–
1554.
10. Arnold C, Davis T, Berkel H, et al. Smoking status, reading level, and
knowledge of tobacco effects among low-income pregnant women. Prev
Med. 2001;32:313–320.
11. Kaufman H, Skipper B, Small L, et al. Effect of literacy on breast-
feeding outcomes. South Med J. 2001;94:293–296.
12. Wolf M, Davis T, Arozullah A, et al. Relation between literacy and HIV
treatment knowledge among patients on HAART regimens. AIDS Care.
2005;17:863–873.
13. Williams MV, Baker DW, Honig EG, et al. Inadequate literacy is a
barrier to asthma knowledge and self-care. Chest. 1998;114:1008–1015.
14. Conlin K, Schumann L. Literacy in the health care system: a study on
open heart surgery patients. J Am Acad Nurse Pract. 2002;14:38–42.
15. Fortenberry J, McFarlane M, Hennessy M, et al. Relation of health
literacy to gonorrhoea related care. Sex Transm Infect. 2001;77:206–
211.
16. Gordon M, Hampson R, Capell H, et al. Illiteracy in rheumatoid arthritis
patients as determined by the Rapid Estimate of Adult Literacy in
Medicine (REALM) score. Rheumatology. 2002;41:750–754.
17. Arozullah A, Lee S, Khan T, et al. The roles of low literacy and social
support in predicting the preventability of hospital admission. J Gen
Intern Med. 2006;21:140–145.
18. Shea J, Beers B, McDonald V, et al. Assessing health literacy in
African American and Caucasian adults: disparities in Rapid Esti-
mate of Adult Literacy in Medicine (REALM) scores. Fam Med.
2004;36:575–581.
19. Bass PF, Wilson J, Griffith CH. A shortened instrument for literacy
screening. J Gen Int Med. 2003;18:1036–1038.
20. Kleinbaum D, Kupper L, Muller K. Applied Regression Analysis and
Other Multivariable Methods. Boston, MA: PWS-Kent Publishing Com-
pany; 1988.
21. Joreskog K, Sorbom D. LISREL 8: User’s Reference Guide. Chicago,
IL: Scientific Software International; 1996.
22. Joreskog K, Sorbom D. PRELIS 2: User’s Reference Guide. Chicago,
IL: Scientific Software International; 1996.
23. Kline R. Principles and Practice of Structural Equation Modeling. New
York, NY: Guilford; 2005.
24. Yarnold PR. Characterizing and circumventing Simpson’s Paradox for
ordered bivariate data. Educ Psychol Meas. 1996;56:430–442.
25. Jastak S, Wilkinson G. Wide Range Achievement Test—Revised 3.
Wilmington, DE: Jastak Associates; 1993.
26. Parker R, Baker D, Williams M, et al. The test of functional health
literacy in adults (TOFHLA): a new instrument for measuring patients’
literacy skills. J Gen Intern Med. 1995;10:537–545.
27. Satorra A, Bentler P. Corrections to test statistics and standard errors in
covariance structure analysis. In: von Eye A, Clogg C, eds. Latent
Variables Analysis: Applications for Developmental Research. Thou-
sand Oaks, CA: Sage Publications; 1994:399–419.
28. Steiger J. Structural model evaluation and modification: an interval
estimation approach. Multivariate Behav Res. 1990;25:173–180.
29. Bentler PM. Comparative fit indexes in structural models. Psychol Bull.
1990;107:238–246.
30. Tucker L, Lewis C. A reliability coefficient for maximum likelihood
factor analysis. Psychometrika. 1973;38:1–10.
31. Baker DW, Gazmararian JA, Sudano J, et al. Health literacy and
performance on the Mini-Mental State Examination. Aging Ment Health.
2002;6:22–29.
32. Nurss J, Baker D, Davis T, et al. Difficulties in functional health literacy
screening in Spanish-speaking adults. J Reading. 1995;38:632–637.
33. Lee SY, Bender DE, Ruiz RE, et al. Development of an easy-to-use
Spanish Health Literacy test. Health Serv Res. 2006;41:1392–1412.
34. Doak C, Doak L, Root J. Teaching Patients With Low-Literacy Skills.
Philadelphia, PA: JB Lippincott; 1996.
35. Parikh N, Parker R, Nurss J, et al. Shame and health literacy: the
unspoken connection. Patient Educ Couns. 1996;27:33–39.
Arozullah et al Medical Care ? Volume 45, Number 11, November 2007
? 2007 Lippincott Williams & Wilkins1032

APPENDIX
1
Rapid Estimate of Adult Literacy in Medicine—Short Form
(REALM-SF)
Suggested Introduction: “We are studying medical word reading in order to
improve communication between healthcare providers and patients. Here is a list
of medical words that may be difficult to read.”
Interviewer: Show the participant the Word List.
Then say, “Starting at the top of the list, please read each word aloud to me. If
you don’t recognize a word, you can say ‘pass’ and move on to the next word.
Your results will be kept strictly confidential and will not be included in your
official medical records.”
Interviewer: If the participant takes more than 5 seconds on a word, say “pass” and
point to the next word. Hold this scoring sheet so that it is not visible to the participant.
Fat Not scored
Flu Not scored
1. Behavior
1
Correct
2
Mispronounced
3
Not attempted
2. Exercise
1
Correct
2
Mispronounced
3
Not attempted
3. Menopause
1
Correct
2
Mispronounced
3
Not attempted
4. Rectal
1
Correct
2
Mispronounced
3
Not attempted
5. Antibiotics
1
Correct
2
Mispronounced
3
Not attempted
6. Anemia
1
Correct
2
Mispronounced
3
Not attempted
7. Jaundice
1
Correct
2
Mispronounced
3
Not attempted
REALM-SF Scoring
Total Correct
(0-7)
Grade Level
0 < 3
rd
grade
1-3 4
th
- 6
th
grade
4-6 7
th
- 8
th
grade
7 > 9
th
grade
Fat
Flu
Behavior
Exercise
Menopause
Rectal
Antibiotics
Anemia
Jaundice
Medical Care ? Volume 45, Number 11, November 2007 REALM-SF
? 2007 Lippincott Williams & Wilkins 1033