TOM S JUZEK'S BLOG

– 
HOME 
– >> Tost Calculator >> 
The TOST – An example of setting delta The TOST is one of the most common equivalence/similarity tests (cf. Richter and Richter, 2002) and is generally attributed to Westlake (1976) and Schuirmann (1981). In the TOST, we test for similarity between two samples, by performing two onesided ttests. In the first test, we check for a difference in means plus delta, in the second test, minus delta. If both ttests come out positive, the TOST comes out positive, indicating similarity. However, a question that frequently comes up when using the TOST is "how do I best set delta?" (cf. Clark, 2009). (delta is the TOST's critical parameter; delta is the absolute difference, theta is the relative difference.) With great help from Greg Kochanski of Google, Johannes Kizach and I tried to give guidelines on how objectively set delta. Based on real data sets from various fields, we simulated a great deal of data which we then used to determine delta. You can find a ms of our paper here [email me]. Admittedly, the paper is a bit technical and not all of it is relevant to you if you just want to run a TOST and move on with your research. So, I decided to write this blog posts to provide a shortcut for the impatient, by giving an example. Assume that we wish to compare the height of residents of Anstruther to residents of Crail. We collected the following measurements*: Anstruther: 173, 168, 150, 171, 166, 161, 170, 163, 164, 169, 178, 178, 158, 164, 165, 165, 180, 193, 174, 165, 170, 180, 171, 157, 169, 178, 163, 167, 180, 168 n = 30; mean = 169.27; stdev = 8.52 Crail: 164, 177, 176, 154, 176, 145, 170, 172, 171, 171, 174, 165, 169, 179, 163, 166, 173, 181, 169, 171, 170, 153, 169, 175, 178, 165, 159, 163, 152, 168 n = 30; mean = 167.93; stdev 8.59 *(These numbers are randomly generated, based on average heights for UK males and females, plus some semirandom standard deviation.) Without any statistical test, the Anstruther Mail might run a silly headline like "Study shows: Anstrutherers taller than Crailers". Ouch, someone's jumping to conclusions. By just looking at the numbers, it doesn't come as a surprise that a test for differences, here a twosided ttest, comes out negative (t = 0.604, df = 58, pvalue = 0.548). But how about the TOST? First, one can make a case that running a TOST in this scenario is sensible, because we are looking at two samples from what could be seen as the same population (residents of rural Scotland). So, let's determine delta. Our formula for delta is as follows (p.23 in the ms):
f_4: delta = 4.58 * (sd_p / sqrt(n_p))
In our case: sd_p = 8.55 (8.52+8.59)/2; n_p = 30 (30+30)/2; sqrt(n_p) = 5.48 So, for our example, delta is 7.15 (and theta is about 0.0424, as 7.15 is 4.424% of 168.6, which is the mean height of both samples). Plugging this into a TOST, gives us the following: First onesided ttest (m_1  m_2 + delta in the numerator): t = 3.843, df = 58, pvalue = 0.000. Second onesided ttest (m_1  m_2  delta in the numerator): t = 2.630, df = 58, pvalue = 0.007. Both onesided ttests of the TOST come out positive, so the TOST comes out positive, indicating similarity within what we see as the standard range for statistical similarity. Thus, it's fair to say that Anstrutherers and Crailers are similarly tall. N.B.: I calculated the two onesided ttests of the TOST with a python script that I wrote. But I will soon put up a site that does that for you: [html link]. REFERENCES Clark, M., 2009. Equivalence testing. Retrieved 16 Dec 2013 from: www.unt.edu/ rss/class/mike/5700/Equivalence%20testing.ppt Richter, S. J., Richter, C., 2002. A method for determining equivalence in industrial applications. Quality Engineering 14 (3): 375380. Schuirmann, D. J., 1981. On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics 37: 617. Westlake, W. J., 1976. Symmetric confidence intervals for bioequivalence trials. Biometrics 32: 741744. R CODE height_anstruther < c(173, 168, 150, 171, 166, 161, 170, 163, 164, 169, 178, 178, 158, 164, 165, 165, 180, 193, 174, 165, 170, 180, 171, 157, 169, 178, 163, 167, 180, 168) height_crail < c(164, 177, 176, 154, 176, 145, 170, 172, 171, 171, 174, 165, 169, 179, 163, 166, 173, 181, 169, 171, 170, 153, 169, 175, 178, 165, 159, 163, 152, 168) mean(height_anstruther) mean(height_crail) sd(height_anstruther) sd(height_crail) sd_p < (sd(height_anstruther)+sd(height_crail))/2 sd_p sqrt_n_p < sqrt(30) sqrt_n_p t.test(height_anstruther, height_crail, paired=FALSE, var.equal=TRUE) tsj; originally posted on 9 Dec 2015 
last
modified: 20 June 2016
