Relevance bias model for chemical risk assessment

June 27, 2011 at 8:02 pm | Posted in Feature Articles | 1 Comment

In the previous two months we have been looking at the risk assessment (RA) process, to see if it has features which help explain how RA can give a chemical such as BPA a relatively clean bill of health, yet the RA conclusions continue to be rejected by many of the researchers studying the toxicity of BPA.

First we looked at the mismatch between the needs of scientists involved in exploratory research compared with those of risk assessors trying to draw firm conclusions about a chemical’s safety (H&E #37). The mismatch helps explain why so few studies by academic researchers are included in risk assessments, even though they investigate the safety of the substances under question.

The research and regulation mismatch presents regulators with a dilemma: either change the way risk assessment is done so the findings of exploratory research can be included; or implement a programme to deliberately follow-up exploratory studies with the large-scale, carefully-documented studies risk assessors demand.

Second we looked at whether the risk assessment methodology itself might introduce biases into reviews of the safety of a substance like BPA (H&E #38). We explored the concept of “relevance for risk assessment”, central to how studies are selected for inclusion in a chemical RA.

We speculated that the demand for relevance might generate systematic bias in RA. If a relevance criterion leads to a consistent set of studies being excluded from RA, then consensus from different RAs about the safety of a substance may have less to do with the findings of studies than it has to do with how the studies are selected.

This month, we present an illustrative model of this potential bias.

Modelling the influence of relevance bias in risk assessment

For simplicity, we are imagining that an existing tolerable daily intake for a substance (TDI: the amount of the substance to which someone can regularly be exposed) may be too high and is therefore under review.

We will assume there are 56 studies in the published literature about the toxicity of substance X (figure 1). The majority (n=48, grey in colour) are ordinary peer-reviewed studies of sufficient quality to have been published in an academic journal.

Figure 1. 56 studies, 8 GLP (orange), 21 finding toxicity (circled in blue)

The remaining 8 (marked in orange) have been carried out according to OECD/GLP guidelines. The studies which indicate that substance X is harmful at the existing TDI, and that the TDI should therefore be lowered, are circled in blue. At this stage, we do not know which of the total of 56 studies are sound.

Method A, figures 2 & 3. In risk assessment, studies are evaluated for their relevance to calculating a tolerable daily intake (TDI) of chemical X. Few ordinary peer-reviewed studies meet the data requirements risk assessors have for this purpose (as we described in H&E #37). The GLP studies are designed to meet the data requirements and necessarily do so. Requiring relevance leaves us with 10 studies for evaluation of a TDI, circled in green (figure 2).

Figure 2. Studies circled in green meet relevance criteria of risk assessment.

Normally, however, risk assessors apply a second filter, reliability, to the studies. In this instance, we will assume, following EFSA’s guidance on safety assessment of pesticides (EFSA 2011) and the reported review methodology of the EU SCENIHR committee (Health and Consumer DG 2011), that following OECD/GLP guidelines is not a strong indicator of a study’s reliability.

In this case, the reliability assessment concludes that six studies (circled in red) out of the 12 relevant for risk assessment are reliable enough for calculating an accurate TDI (figure 3).

Figure 3. Method A. Reliability assessment leaves 6 studies for weight-of-evidence assessment.

A weight-of-evidence analysis is then carried out to determine whether or not the TDI should be changed. For the sake of simplicity of the model, each study carries equal weight of 1, with a +1 score for showing safety and -1 for showing harm.

Three studies show the TDI should be lowered and three show the TDI is fine as it is, giving a total weight-of-evidence of zero on this assessment. This means there is no clear case for changing the TDI, so the TDI stays the same – or as RA committees would likely put it: there is no reason for concluding that substance X is harmful, so long as exposure does not exceed the existing TDI.

Method B, Figures 2 & 4. If following GLP guidelines is taken as an indicator of reliability, as has been argued to be the case in EFSA and FDA risk assessments of BPA (Myers et al. 2009), then GLP studies are more likely to be included in the assessment. In this model, we assume the perceived reliability of GLP increases the total number of studies judged reliable to eight (circled in red, figure 4).

Figure 4. Method B. GLP is taken as an indicator of reliability, increasing the number of GLP studies in the RA by 2.

Of these studies, five show the TDI is acceptable and three suggest it should be changed. This leaves a total weight-of-evidence of +2 in favour of the safety of the existing TDI, which is evidence in favour of the TDI being correct.

Method C, Figure 5. It is possible to assess the evidence with none of the RA filters in place, a model which most closely resembles the approach of the Chapel Hill statement about the safety of BPA (vom Saal et al. 2007).

For this method for assessing potential harm from substance X, usefulness for calculating a TDI is not a factor in determining relevance for a weight-of-evidence assessment; the only factor which matters is if the study is of sufficient quality that its findings are highly likely to be true.

Here we assume that 16 of the 56 peer-reviewed studies here are found to be of high quality (circled in purple). 11 show toxicity and 5 do not, providing a weighting of -6 in favour of toxicity, which is strong evidence of potential harm (figure 5).

Figure 5. Method C. Discounting relevance to RA as a selection criterion could lead to a very different picture of the evidence.


This simplified model illustrates how the process of selecting studies for review can alter conclusions about the safety of a substance. It has the appearance of a variety of selection bias.

Although Method C provides strong evidence of potential harm it does not translate into a change in TDI because the studies do not provide data from which a TDI can be calculated.

Method A and Method B corroborate each other’s findings, to generate a consensus view that substance X is safe. However, that consensus is driven by the methods both applying similar exclusion criteria prior to assessing the validity of the remaining studies, thus leading to similar results in the weight-of-evidence assessment.

Ironically, in methods A and B it is the need for data which allows calculation of a TDI which leads to overestimation of the safety of substance X. This is the opposite of the intention of risk assessment, which is supposed to be conservative and err on the side of caution.

Obviously, weight-of-evidence analysis is more complex than portrayed, and this model cannot prove the existence of relevance bias, only anticipate (in broad brushstrokes) its potential effect. It serves as a reminder that consensus between reviews can be generated as much by application of similarly biased review methodology, leading to similar skewing of the results, as it can by objective analysis of the evidence.

Given the potential for relevance and reliability to alter which studies are selected for weight-of-evidence assessment, the possibility that the risk assessment process by its nature might produce biased results should be more closely examined.

There is also the question of what ought to be done if there is strong, consistent evidence of harm from studies which do not permit calculation of a TDI. Currently, bodies such as the European Food Safety Authority conclude in such circumstances that there is no reason to change the TDI. It might, however, be more prudent to conclude that a TDI cannot be calculated, or that there is reason for thinking the existing TDI is unreliable.

This has been noted in the minority opinion of EFSA’s assessment of the safety of BPA: “Due to methodological shortcomings, none of the new studies can be used to derive a more stringent NOAEL that could lead to a newly  established  numerical  TDI value. However, due to the overall weight of evidence, the current TDI of 50 µg/kg body weight may not be confirmed as a full TDI and should be considered as temporary.” (EFSA 2010)

If it is accepted that studies relevant for RA can be contradicted by a consensus view generated by research not intended for calculation of a TDI, then regulators are faced with the following dilemma: either

  1. regulate on the basis of consistent evidence of possible harm, even if this evidence does not permit calculation of a safe exposure level, or
  2. set up independent laboratories able to determine, according to regulatory data requirements and with the latest scientific techniques, what the actual safe level is – as opposed to waiting for a researcher outside the regulatory system to take it upon themselves to produce the data which the regulators need.

1 Comment »

RSS feed for comments on this post. TrackBack URI

  1. This problem would disappear if academic resaerchers were to follow the basic requirements laid out in the OECD GLP guidlines. These need not be onerous, remember that Good Laboratory Practice is what it says. Simply, the appropriate practice that any good scientist should be following if they wish to generate robust results. There is no necessity to become a formally accredited facility, which would be necessary to submit data to a regulatory authority.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at
Entries and comments feeds.

%d bloggers like this: