TOM S JUZEK'S BLOG



HOME
–     >> Tost Calculator >>




The TOST – An example of setting delta

The TOST is one of the most common equivalence/similarity tests (cf. Richter and Richter, 2002) and is generally attributed to Westlake (1976) and Schuirmann (1981). In the TOST, we test for similarity between two samples, by performing two one-sided t-tests. In the first test, we check for a difference in means plus delta, in the second test, minus delta. If both t-tests come out positive, the TOST comes out positive, indicating similarity. However, a question that frequently comes up when using the TOST is "how do I best set delta?" (cf. Clark, 2009). (delta is the TOST's critical parameter; delta is the absolute difference, theta is the relative difference.)

With great help from Greg Kochanski of Google, Johannes Kizach and I tried to give guidelines on how objectively set delta. Based on real data sets from various fields, we simulated a great deal of data which we then used to determine delta. You can find a ms of our paper here [email me]. Admittedly, the paper is a bit technical and not all of it is relevant to you if you just want to run a TOST and move on with your research. So, I decided to write this blog posts to provide a shortcut for the impatient, by giving an example.


Assume that we wish to compare the height of residents of Anstruther to residents of Crail. We collected the following measurements*:

Anstruther:
173, 168, 150, 171, 166, 161, 170, 163, 164, 169, 178, 178, 158, 164, 165, 165, 180, 193, 174, 165, 170, 180, 171, 157, 169, 178, 163, 167, 180, 168
n = 30;   mean = 169.27;   stdev = 8.52

Crail:
164, 177, 176, 154, 176, 145, 170, 172, 171, 171, 174, 165, 169, 179, 163, 166, 173, 181, 169, 171, 170, 153, 169, 175, 178, 165, 159, 163, 152, 168
n = 30;   mean = 167.93;   stdev 8.59

*(These numbers are randomly generated, based on average heights for UK males and females, plus some semi-random standard deviation.)


Without any statistical test, the Anstruther Mail might run a silly headline like "Study shows: Anstrutherers taller than Crailers". Ouch, someone's jumping to conclusions. By just looking at the numbers, it doesn't come as a surprise that a test for differences, here a two-sided t-test, comes out negative (t = 0.604, df = 58, p-value = 0.548). But how about the TOST? First, one can make a case that running a TOST in this scenario is sensible, because we are looking at two samples from what could be seen as the same population (residents of rural Scotland). So, let's determine delta. Our formula for delta is as follows (p.23 in the ms):

f_4:   delta = 4.58 * (sd_p / sqrt(n_p))

In our case: sd_p = 8.55 (8.52+8.59)/2;   n_p = 30 (30+30)/2;   sqrt(n_p) = 5.48

So, for our example, delta is 7.15 (and theta is about 0.0424, as 7.15 is 4.424% of 168.6, which is the mean height of both samples). Plugging this into a TOST, gives us the following:
First one-sided t-test (m_1 - m_2 + delta in the numerator): t = 3.843, df = 58, p-value = 0.000.
Second one-sided t-test (m_1 - m_2 - delta in the numerator): t = -2.630, df = 58, p-value = 0.007.

Both one-sided t-tests of the TOST come out positive, so the TOST comes out positive, indicating similarity within what we see as the standard range for statistical similarity. Thus, it's fair to say that Anstrutherers and Crailers are similarly tall.


N.B.: I calculated the two one-sided t-tests of the TOST with a python script that I wrote. But I will soon put up a site that does that for you: [html link].



REFERENCES

Clark, M., 2009. Equivalence testing. Retrieved 16 Dec 2013 from: www.unt.edu/- rss/class/mike/5700/Equivalence%20testing.ppt

Richter, S. J., Richter, C., 2002. A method for determining equivalence in industrial applications. Quality Engineering 14 (3): 375-380.

Schuirmann, D. J., 1981. On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics 37: 617.

Westlake, W. J., 1976. Symmetric confidence intervals for bioequivalence trials. Biometrics 32: 741-744.



R CODE

height_anstruther <- c(173, 168, 150, 171, 166, 161, 170, 163, 164, 169, 178, 178, 158, 164, 165, 165, 180, 193, 174, 165, 170, 180, 171, 157, 169, 178, 163, 167, 180, 168)
height_crail <- c(164, 177, 176, 154, 176, 145, 170, 172, 171, 171, 174, 165, 169, 179, 163, 166, 173, 181, 169, 171, 170, 153, 169, 175, 178, 165, 159, 163, 152, 168)
mean(height_anstruther)
mean(height_crail)
sd(height_anstruther)
sd(height_crail)
sd_p <- (sd(height_anstruther)+sd(height_crail))/2
sd_p
sqrt_n_p <- sqrt(30)
sqrt_n_p

t.test(height_anstruther, height_crail, paired=FALSE, var.equal=TRUE)



tsj; originally posted on 9 Dec 2015
last modified: 20 June 2016