This work was supported by the Agency for Healthcare Research and Quality (AHRQ) (Grant R01HS018533 to SHD). Kristin Palmsten is supported by Training Grant T32HD060454 in Reproductive, Perinatal and Pediatric Epidemiology from the National Institute of Child Health and Human Development (NICHD), National Institutes of Health. The Pharmacoepidemiology Program at the Harvard School of Public Health receives funding from Pfizer and Asisa. SHD has consulted for GSK and Novartis. The sources of funding and potential conflicts of interest do not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
Conceived and designed the experiments: KP KFH HM MKK SS SHD. Performed the experiments: KP KFH HM MKK. Analyzed the data: KP KFH HM. Contributed reagents/materials/analysis tools: KFH SS SHD. Wrote the paper: KP. Interpretation of data: KP KFH HM MKK PLW KM SHD. Critically revised manuscript: KFH HM MKK PLW KM SS SHD.
In the absence of clinical trial data, large post-marketing observational studies are essential to evaluate the safety and effectiveness of medications during pregnancy. We identified a cohort of pregnancies ending in live birth within the 2000–2007 Medicaid Analytic eXtract (MAX). Herein, we provide a blueprint to guide investigators who wish to create similar cohorts from healthcare utilization data and we describe the limitations in detail.
Among females ages 12–55, we identified pregnancies using delivery-related codes from healthcare utilization claims. We linked women with pregnancies to their offspring by state, Medicaid Case Number (family identifier) and delivery/birth dates. Then we removed inaccurate linkages and duplicate records and implemented cohort eligibility criteria (i.e., continuous and appropriate enrollment type, no private insurance, no restricted benefits) for claim information completeness.
From 13,460,273 deliveries and 22,408,810 child observations, 6,107,572 pregnancies ending in live birth were available after linkage, cleaning, and removal of duplicate records. The percentage of linked deliveries varied greatly by state, from 0 to 96%. The cohort size was reduced to 1,248,875 pregnancies after requiring maternal eligibility criteria throughout pregnancy and to 1,173,280 pregnancies after further applying infant eligibility criteria. Ninety-one percent of women were dispensed at least one medication during pregnancy.
Mother-infant linkage is feasible and yields a large pregnancy cohort, although the size decreases with increasing eligibility requirements. MAX is a useful resource for studying medications in pregnancy and a spectrum of maternal and infant outcomes within the indigent population of women and their infants enrolled in Medicaid. It may also be used to study maternal characteristics, the impact of Medicaid policy, and healthcare utilization during pregnancy. However, careful attention to the limitations of these data is necessary to reduce biases.
In the United States (US), 50–70% of pregnant women use at least one prescription drug during their pregnancy
Medicaid is the joint state and federal health insurance program in the US for low-income individuals. States provide Medicaid claims to the Centers for Medicare and Medicaid Services (CMS) through the Medicaid Statistical Information System (MSIS), and Medicaid Analytic eXtract (MAX) data are extracted from the MSIS to support research and policy analysis
Medicaid covers the medical expenses of over 40% of births in the US
Statewide Medicaid, Canadian province-wide, and health maintenance organization healthcare utilization data have been used to identify woman-infant linked pregnancy cohorts
Below we describe the methods used to identify pregnancies in MAX, link women to their live born infants, select a cohort of woman-infant pairs, and reduce the limitations of MAX for studies of medications in pregnancy and other healthcare factors. We also present the characteristics of women in the cohort and the frequency of several pregnancy outcomes.
We obtained MAX data for all states and Washington, DC, except Arizona, which had inaccurate personal identifiers
Financial criteria alone do not qualify individuals for Medicaid; rather, individuals must also belong to an appropriate eligibility group to qualify, namely children under age 21, adults with dependent children, pregnant women, individuals with disabilities, and seniors
To identify inpatient deliveries from the source population, we utilized the MAX delivery code variable, which is only available in the IP file and identifies hospitalizations with a delivery-related
To identify outpatient (i.e., physician, clinic, or outpatient hospital) delivery-related claims, we utilized the delivery procedure codes from the OT file. A large proportion of the outpatient delivery-related procedures were for post-partum care, which could occur several days after delivery. We defined the outpatient delivery date range as the five days before and after the delivery-related procedure. If the date of an outpatient delivery-related procedure overlapped with an inpatient delivery date range for the same woman, then the outpatient delivery-related claim was removed.
A woman could have more than one delivery identified either because she had more than one pregnancy during the study period or because she had the same delivery identified more than once with unique delivery date ranges. Instead of selecting one delivery per woman during a certain time period
We identified children with a date of birth (DOB) between 2000–2007 and a Case Number that was not missing or zero and did not end in 8 zeros. Some MSIS_IDs are associated with more than one DOB, e.g., correct DOB, mistyped DOB and DOB incorrectly assigned as the first date of Medicaid eligibility. We consolidated infants with the same Case Number and DOBs less than three days apart after, rather than before, linkage. There were 22,408,810 different MSIS_ID-DOB combinations available for linkage to deliveries.
Successful linkage requires the mother to be enrolled in Medicaid on the child’s DOB, accurate recording and consistent use of the Case Number within families, and accurate dates of delivery and birth. Within each state, we linked women to infants by the Case Number.
CA, California.
First, within each state, we linked inpatient deliveries with the pool of child MSIS_ID-DOB combinations, and then we linked outpatient delivery-related claims with child MSIS_ID-DOBs that had not been linked. The delivery linkage percentage was defined as the proportion of delivery date ranges (there could be more than one for the same delivery) that linked to a MSIS_ID-DOB combination (there could be more than one per infant). The child linkage percentage was defined as the proportion of child MSIS_ID-DOB combinations that linked to a delivery. While generally informative, these linkage percentages should be interpreted with caution. The same delivery could be counted in the denominator of the delivery linkage percentage more than once if more than one delivery date range was identified, and the same child could be counted in the denominator of the child linkage percentage more than once if the child had more than one MSIS_ID or DOB recorded. Likewise, linked deliveries and children could be counted in the numerators of both percentages more than once if the linked child had more than one MSIS_ID. Finally, the child linkage percentages are low because not all mothers of children from the pool were enrolled in Medicaid on their child’s DOB, making their deliveries unavailable for linkage. Besides matching women with their offspring, the linkage procedure contributed to the de-duplication of infants and the establishment of delivery date, as explained below.
To produce a cohort of unique pregnancies from the linked deliveries, we implemented several data cleaning steps to remove deliveries that were incorrectly linked or duplicated. To remove incorrectly linked deliveries, we excluded all infants that were linked to more than one woman’s MSIS_ID. Then we removed all deliveries that were linked to infant MSIS_IDs with DOBs more than two days apart (more than three days apart for outpatient deliveries) and less than 24 weeks apart. This step preserved multifetal deliveries, but it removed deliveries that were unreasonably close in time. To remove duplicate deliveries, we combined linked deliveries from the same woman into one delivery if the infants’ DOBs were less than three days apart (less than four days for outpatient deliveries). For these deliveries, the earliest DOB was selected as the infants’ DOB and the woman’s delivery date.
The date of the LMP was estimated because neither gestational length nor the LMP is available in healthcare utilization data. It was assigned to be 245 days before the infant’s DOB for pregnancies that had maternal or infant ICD-9 codes indicative of preterm delivery (644.0, 644.2, and 765.x) and to be 270 days before the infant’s DOB for all other pregnancies
MAX may contain an incomplete record of healthcare claims for the linked women when they are not enrolled in Medicaid, have supplemental private insurance, have restricted benefits, such as pregnancy-related services or prescription drug benefits only, or are enrolled in certain managed care plans
Women were excluded if any of the following four eligibility criteria were not met, according to the PS file, in at least one month during the eligibility period of interest: 1) Continuous enrollment throughout the eligibility period, defined as at least 28 days of enrollment per calendar month. 2) No private insurance. 3) No restricted benefits. 4) Appropriate enrollment type (i.e., capitated managed care, FFS PCCM managed care, or no managed care) depending on state (
MAX anomalies tables indicate that individuals may receive more than one MSIS_ID within the same state
We required additional infant eligibility criteria for studies of infant outcomes. Because we could not rule out multiple infant MSIS_IDs per pregnancy as an indication of poor linkage quality, we required more stringent eligibility criteria in those pregnancies: if infant zip code was different from maternal zip code, then the infant was excluded. Prior to applying infant eligibility criteria, zip codes did not match in 5.5% of pregnancies linked to one infant MSIS_ID, 25.7% of pregnancies linked to two MSIS_IDs, and 48.7% of pregnancies linked to three MSIS_IDs.
Because multiple infant MSIS_IDs that linked to the same pregnancy may represent the same infant, we pooled eligibility information from all infant MSIS_IDs associated with a pregnancy and required that at least one infant MSIS_ID had either Medicaid enrollment in the month after the birth month or a claim in the month after birth. Pregnancies with neonatal death remained eligible even if they did not meet the enrollment criteria.
Pharmacy claim dates during pregnancy were used to identify pregnant women who were dispensed a medication. Both inpatient and outpatient ICD-9 diagnostic codes were used to identify pregnancy outcomes.
Overall, of the 10,058,005 identified inpatient deliveries, 55.6% linked to at least one infant, and of the 3,402,268 outpatient (i.e., physician, clinic or outpatient hospital) delivery claims, 23.8% linked to at least one infant. The delivery linkage percentages varied greatly by state (
Inpatient Linkage | Outpatient Linkage | Inpatient Linkage | Outpatient Linkage | |||||
State | N Deliveries Before Linkage | % of Deliveries Linked | N Deliveries Before Linkage | % of Deliveries Linked | N Child MSIS_ID-DOBs Before Linkage | % of Child MSIS_ID-DOBs Linked | N Child MSIS_ID-DOBs Before Linkage | % of Child MSIS_ID-DOBs Linked |
AK | 31553 | 83.4 | 7383 | 41.7 | 57433 | 45.9 | 30541 | 8.0 |
AL | 335584 | 83.9 | 3177 | 54.0 | 357467 | 51.7 | 164090 | 1.0 |
AR | 133629 | 21.7 | 111902 | 2.0 | 272989 | 10.6 | 243491 | 0.7 |
CA | 905520 | 78.9 | 417779 | 38.3 | 3384571 | 21.1 | 2592491 | 5.4 |
CO | 116192 | 79.3 | 12198 | 47.0 | 296181 | 31.1 | 202696 | 1.7 |
CT | 20782 | 0 | 1970 | 0 | 164612 | 0 | 164612 | 0 |
DC | 3927 | 80.4 | 915 | 26.4 | 51082 | 6.2 | 47841 | 0.3 |
DE | 10149 | 93.2 | 23841 | 70.6 | 58891 | 16.1 | 49230 | 31.1 |
FL | 543810 | 71.6 | 443337 | 7.1 | 1483965 | 26.3 | 1054754 | 1.6 |
GA | 415916 | 35.9 | 100357 | 19.1 | 972752 | 15.4 | 812358 | 1.9 |
HI | 27633 | 77.6 | 11758 | 65.1 | 74707 | 28.7 | 52894 | 12.9 |
IA | 86326 | 73.0 | 21308 | 55.8 | 179092 | 35.2 | 115181 | 9.3 |
ID | 61086 | 74.6 | 8050 | 46.8 | 92037 | 49.5 | 45823 | 5.8 |
IL | 458715 | 72.3 | 156915 | 13.5 | 931766 | 35.0 | 600100 | 2.4 |
IN | 222255 | 91.7 | 178780 | 18.0 | 433979 | 47.0 | 226903 | 10.1 |
KS | 80230 | 90.9 | 12787 | 65.1 | 180418 | 40.4 | 105982 | 6.4 |
KY | 175466 | 81.1 | 49526 | 40.8 | 336217 | 42.4 | 173547 | 9.0 |
LA | 270481 | 88.2 | 138652 | 12.6 | 547583 | 43.6 | 208107 | 5.5 |
MA | 81289 | 89.4 | 17975 | 33.4 | 319082 | 22.8 | 243566 | 1.3 |
MD | 1470560 | 9.3 | 13016 | 10.8 | 321470 | 42.5 | 182008 | 0.6 |
ME | 15566 | 93.0 | 3909 | 25.6 | 68627 | 21.1 | 53919 | 1.4 |
MI | 177475 | 70.7 | 71736 | 26.6 | 626024 | 20.1 | 497896 | 2.9 |
MN | 91875 | 93.8 | 70072 | 67.2 | 316557 | 27.2 | 214314 | 19.2 |
MO | 253912 | 54.8 | 39871 | 31.2 | 402150 | 34.3 | 261520 | 3.8 |
MS | 184337 | 90.2 | 93213 | 11.6 | 353758 | 47.3 | 138740 | 4.8 |
MT | 27084 | 0 | 4093 | 0 | 52429 | 0 | 52429 | 0 |
NC | 484133 | 17.3 | 44800 | 9.3 | 679462 | 12.3 | 594079 | 0.5 |
ND | 16679 | 96.0 | 3334 | 36.2 | 33598 | 47.7 | 17335 | 4.4 |
NE | 25261 | 79.7 | 8158 | 54.6 | 125239 | 16.1 | 104822 | 3.9 |
NH | 22699 | 94.0 | 3519 | 46.6 | 55521 | 38.5 | 33832 | 3.4 |
NJ | 89215 | 84.4 | 88993 | 53.4 | 521386 | 14.5 | 428403 | 9.8 |
NM | 93901 | 85.4 | 28986 | 72.7 | 222010 | 36.1 | 129329 | 13.8 |
NV | 31258 | 89.5 | 11527 | 75.4 | 180635 | 15.5 | 147048 | 5.0 |
NY | 642194 | 52.1 | 107573 | 34.3 | 1396520 | 23.8 | 1056220 | 2.8 |
OH | 252528 | 94.2 | 154680 | 17.1 | 725433 | 32.8 | 481332 | 2.6 |
OK | 156994 | 87.9 | 37478 | 55.5 | 392679 | 35.2 | 243912 | 7.2 |
OR | 98716 | 88.0 | 23427 | 60.2 | 235288 | 36.9 | 146705 | 7.8 |
PA | 111772 | 93.0 | 17485 | 31.2 | 634621 | 16.3 | 529191 | 0.8 |
RI | 46047 | 91.1 | 8280 | 31.4 | 63926 | 48.0 | 32622 | 4.8 |
SC | 199463 | 16.1 | 116129 | 1.7 | 341982 | 9.4 | 309065 | 0.5 |
SD | 30083 | 93.8 | 3170 | 50.1 | 59627 | 47.4 | 30916 | 4.0 |
TN | 168714 | 80.0 | 71995 | 69.3 | 447117 | 30.2 | 309663 | 14.1 |
TX | 831729 | 9.1 | 486918 | 1.0 | 2349734 | 3.2 | 2272382 | 0.1 |
UT | 51991 | 95.7 | 9264 | 81.6 | 203126 | 24.5 | 152506 | 2.3 |
VA | 151876 | 87.1 | 62204 | 58.6 | 391259 | 33.9 | 255155 | 12.2 |
VT | 18386 | 91.9 | 2807 | 56.8 | 38560 | 43.9 | 20994 | 6.1 |
WA | 117179 | 84.4 | 52704 | 63.9 | 427660 | 23.1 | 327281 | 8.8 |
WI | 154097 | 94.0 | 33966 | 36.9 | 354202 | 40.9 | 174700 | 5.2 |
WV | 40739 | 85.8 | 6237 | 47.5 | 149386 | 23.5 | 101601 | 2.1 |
WY | 20999 | 93.9 | 4114 | 39.4 | 44000 | 45.0 | 23201 | 5.2 |
From the 6,401,483 linked deliveries, 293,911 (4.6%) were removed in the post-linkage cleaning phase, resulting in the identification of 6,107,572 unique pregnancies ending in live birth. Of deliveries that were removed, 99.2% were combined with other deliveries because they were duplicates and the rest appeared to be incorrect linkages.
Of the eligibility criteria, the requirement of enrollment throughout pregnancy had the greatest impact on the cohort size (
The number of pregnancies in the base cohort (eligible from at least 1 month before the LMP month until the month after the delivery month) is represented in red and the number of pregnancies when additional or fewer months of eligibility are required is represented in blue. The lengths of the eligibility periods decrease when moving away from the vertical axis along the horizontal axis. – indicates the number of months before the LMP and+indicates the number of months after the LMP at which the eligibility period begins, and all eligibility periods continue until the month after the delivery month. LMP, last menstrual period; M, months.
A) Age, B) Race, C) Medicaid Eligibility Group, and D) Delivery Year.
Overall, 91% of women had at least one pharmacy dispensing during pregnancy (
We developed the methodology to utilize a promising and previously untapped resource for studies of medication exposure during pregnancy and a broad range of maternal and infant outcomes. Healthcare utilization data offer a number of advantages for studies of medications in pregnancy over pregnancy registry and case-control studies including the availability of large, population-based cohorts in which the study of rare outcomes and important subgroups is feasible, the availability of exposed and reference groups from the same population, prospectively collected information on a range of prescription drugs, information on many maternal and neonatal outcomes, and low study cost compared to de novo data collection
Several characteristics of the cohort have face validity. We found that woman-infant pairs with zip codes that did not match were uncommon in most states, which supports accurate linkage. Zip codes may not match for reasons other than poor linkage; infants may not live with their mothers, or maternal zip code may not be current on the delivery date due to changes of residence. Therefore, we did not require zip codes to match in pregnancies with one infant MSIS_ID. Furthermore, the frequency of several pregnancy outcomes was similar to expectations
There were several state-specific Medicaid program and data quality factors that contributed to the large variation in the number of pregnancies in the cohort from each state, including the number of pregnant women covered by Medicaid, availability of MAX for all study years, Medicaid benefit restriction and eligibility policies for enrollees, the completeness of claims, quality of Case Numbers, and reuse of Case Numbers within families. The same Case Number is not necessarily shared by all family members
Investigators planning to work with the cohort should be aware of a number of limitations inherent in the data and strategies to address at least some of these issues. To begin with, we restricted the cohort to live births only, because of the infant-linkage step, so it cannot be used to study spontaneous abortion and stillbirths. There is potential for selection bias if the exposure of interest is associated with spontaneous abortion or stillbirth and there are unmeasured common causes of these outcomes and the outcome of interest
The method we used to estimate the LMP accurately classified gestational age within 2 weeks for nearly all term and 75% of preterm pregnancies and was superior to other algorithms in one healthcare utilization database
Furthermore, incorrect linkage, i.e., false woman-infant pairs, could occur if the same Case Number was shared among different families. To our knowledge, there is no CMS validation report regarding the use of the Case Number within families. Incorrect linkage would likely result in non-differential misclassification of infant outcomes which would tend to bias associations towards the null and is problematic for drug safety studies. Social security numbers could be used to validate the linkage method in states where they are available in birth records. It is advisable to conduct sensitivity analyses restricted to a subcohort with potentially higher linkage quality when using the cohort to study infant outcomes.
We privileged internal validity over external validity and statistical power when we implemented maternal eligibility criteria. To ensure complete follow-up through pregnancy, 80% of the originally identified pregnancies were removed. DeVoe et al. reported that healthcare services were under-recorded in Medicaid claims data from Oregon compared to electronic health records
Because date of death is under-recorded in MAX
Given the decentralized nature of Medicaid data and the sheer number of enrollees, it is unlikely that all cohort members’ claims will be captured in MAX even after applying strict eligibility criteria. Sensitivity analyses should be performed that exclude individuals who are least likely to have complete claim information such as women enrolled in capitated managed care plans
Although medical record validation studies have been described for Medicaid data among Medicaid and Medicare dually eligible enrollees
Some limitations of the MAX cohort are common to other pregnancy cohorts assembled from healthcare utilization data, such as the exclusion of pregnancies ending in miscarriage
MAX contains a huge nationwide cohort of pregnant women and prospectively collected data, which permits the study of rare medication exposures and outcomes in an otherwise understudied population. However, linkage of women to their infants is not straightforward, enrollment time is limited for many pregnant women in Medicaid because pregnancy qualifies some women for Medicaid
Cohort set-up is resource intensive, from requesting and receiving the data from CMS to linkage and implementation of eligibility criteria. However, once the cohort is assembled, it offers an incredible opportunity to efficiently evaluate medication safety during pregnancy as well as maternal characteristics, the impact of Medicaid policy, and regional differences in healthcare utilization during pregnancy.
(TIF)
(PDF)