Better Call Paul | Are Your Statistics Even Significant, Bro?
In this episode, the second of five special Better Call Paul episodes, Kris and Paul discuss statistics and their significance with data analyst and fellow Power Company coach, Dale Wilson.
They start by discussing what statistics really are and why we use them. They consider statistical significance and the issues surrounding it, recognizing some flaws of current models. Dale also explains his views on statistical models, p-values, and more — and breaks down how we should be looking at all this data as climbers.
*Additional studies/resources mentioned in this episode:
Scientists rise up against statistical significance by Valentin Amrhein, Sander Greenland, and Blake McShane; published in Nature 567 (305-307), 2019.
800 scientists say it’s time to abandon “statistical significance” by Brian Resnick; published on Vox; March, 22, 2019.
Systematic review of the use of “magnitude-based inference” in sports science and medicine by Keith R. Lohse, Kristin L. Sainani, J. Andrew Taylor, Michael L. Butson, Emma J. Knight, and Andrew J. Vickers; published on PLOS ONE; June 26, 2020.
New episodes of Better Call Paul drop on Wednesdays. Make sure you’re subscribed, leave us a review, and share!
And please tell all of your friends who are confused and overwhelmed by the amount of jumbled and conflicting training info out there, that you have the perfect podcast for them.
Got a question? Comments? Want to suggest a paper to be discussed? Get in touch and let us know!
Better Call Paul | Breaking Beta is brought to you by Power Company Climbing and Crux Conditioning, and is a proud member of the Plug Tone Audio Collective. Find full episode transcripts, citations, and more at our website.
FULL EPISODE TRANSCRIPT:
Breaking Bad Audio Clip 00:05
A strong force of attraction. Covalent bonds, ionic bonds, the coming together of atoms and molecules to form compounds. Chemical bonds are what made matter matter. Bonds are what hold the physical world together. What hold us together.
Yeah, yeah, no, I got it. Bonds.
Your test score says otherwise. It tells me you don't get it, at all.
Yeah. I mean, 58... I was close.
What is "close"? There is no close in science, Barry. There are right answers and wrong answers. "Close" didn't put men on the moon.
Yeah, but I'm just saying, Mr. White, two points. Look, if I don't pass chemistry, I have to go to summer school. And I mean, I really studied like, really, all night hard. And I mean, I'm so into chemistry for like, the concepts. I just think I might have, you know, the attention deficit....can you please just let this slide?
Don't bullshit a bullshitter. The answer is no. Next time apply yourself.
Kris Hampton 01:26
No, we cannot let this one slide, Paul.
Paul Corsaro 01:31
Hahaha
Kris Hampton 01:31
We're gonna have to apply ourselves. And actually, I play this clip partly because I think that the idea of science telling us, you know, right and wrong answers, and there not being room for gray area is a bit of a misunderstanding. Particularly in sport science, we're often dealing with theories and hypothesis, tiny parts of a much more complex organism, and miniscule sample sizes. And and this doesn't even take into account the significant role that humans and human error and human bias play in the understanding of science and research.
Paul Corsaro 02:17
It kind of just goes into the theme of why we're doing a lot of this, right? It's a lot of this stuff has been extrapolated as binary, right/wrong, this the way to do it, this works/this doesn't work, when really, there's a lot of nuance, and the binary kind of model doesn't work a lot of times.
Kris Hampton 02:34
Yeah, absolutely. And, you know, there's also the social media component that is driving us to do this a little bit, because one of the things we found in Season One is that people got really bent out of shape about the statistics. You know, "Well, your P value is this, so therefore, it means this. I don't even need to look at the study, I can just look at that number and tell you everything it already says" and I'm like, "No, you need to apply yourself."
Paul Corsaro 03:05
Haha
Kris Hampton 03:06
So so I think it's a good a good thing to talk about. We are joined by our first esteemed guest here on the Breaking Beta Podcast. Today, we've got our friend and fellow Power Company coach and data analyst Dale Wilson with us to talk statistics, because Dale fucking loves to talk about statistics and I don't know the first thing about statistics, so
Paul Corsaro 03:32
I'll say frankly, Dale can talk better about that than we can, all day.
Kris Hampton 03:37
For sure.
Dale Wilson 03:38
I'll try not to talk all day, but glad to be here guys.
Kris Hampton 03:43
How you doing, man?
Dale Wilson 03:44
I'm good. Yeah, life's good. I'm good. Statistics are good. Sometimes.
Kris Hampton 03:48
Hahaha. Is there a specific statistical model that's better than the others? In general, in all of life?
Dale Wilson 03:59
No, none of them are better than the others, but I certainly have favorites I guess.
Kris Hampton 04:04
All right, well, let's uh, let's get this thing rolling.
Breaking Bad Audio Clip 04:08
You clearly don't know who you're talking to, so let me clue you in.
Paul Corsaro 04:13
I'm Paul Corsaro
Kris Hampton 04:14
I'm Kris Hampton.
Breaking Bad Audio Clip 04:15
Lucky two guys, but just guys, okay?
Paul Corsaro 04:19
And you're listening to Breaking Beta,
Kris Hampton 04:22
Where we explore and explain the science of climbing
Breaking Bad Audio Clip 04:25
With our skills, you'll earn more than you ever would on your own.
Breaking Bad Audio Clip 04:30
We've got work to do. Are you ready?
Kris Hampton 04:36
Dale, I hope that you're ready because we've got a million questions for you. And they're they're complicated questions in my mind, because when I look at statistics, that's how it feels to me, complicated. Will you start us off with a short explanation of why it is we even have statistics in these research papers? Why not just look directly at the results? You know, a lot of these papers have a chart that shows what the results looked like for the participants. Why not just look at that chart and take that directly? Why do we need these statistical models at all?
Dale Wilson 05:19
I would say statistics are incredibly useful because they help you understand larger patterns in things, even if you don't necessarily have all the pieces of the puzzle at that time. So in a case of like, a small sports science study, or something, you can't obviously sample like an entire population.
Kris Hampton 05:41
Sure
Dale Wilson 05:41
It's a, you're looking at a relationship between two numbers that can be expounded, or is frequently expounded to mean for like, larger groups as well. So I mean, it's a, it's an extension of what math is used for in anything, which is to help us wrap our little human brains around phenomenon that we see and try to put order to them.
Kris Hampton 05:41
Right
Dale Wilson 05:41
But you can design an experiment, have a relatively small sample size for it, but from that sample size, interpret what would maybe happen in a larger population, which is usually how they're used. In terms of why not just like looking at graphs or numbers, I mean, even if you look at like the results of a paper, and it gives you like a percent, if you're not looking at like 100, like a percent is a is an extrapolation of itself. Like it's a
Kris Hampton 06:27
Mm hmm. I imagine it is probably particularly useful then for, especially for these like climbing and sports science studies that are often very small. Sports science in general often has small sample sizes and climbing studies, you know, Paul and I looked at 10 papers, 11 papers over Season One and most of those sample sizes are pretty damn small, you know, 10 people, or fewer.
Dale Wilson 06:57
Absolutely, yeah, that's, I mean, that's a that's one of the bigger challenges not just in sport science, in social science, life sciences. How do you get adequate sample size to say that something means what you think it means, and that you're actually able to control for all the variables that you're looking into? I mean, that's kind of the biggest, biggest challenge for all these things. When you look at very simple like, hard physical science or engineering studies, it's a lot easier, because you're like, "Here are our variables and we have much finer control over them". It's much easier, but when you get into social science, sports science, psychology, all these things, it's it's much harder, and you have to really understand what you're working with, in order to understand if your experiment is designed well, and how, how to properly analyze it.
Paul Corsaro 07:51
Do you think these statistical models are helpful for kind of getting some of that signal from the noise for these small sample studies? Is it a little bit easier maybe for someone to intuit what like, say, you had you know, a study that had like, you know, an amazing sample size, would you be able to be able to just see, or just look at the data, and maybe grasp a little bit more what the meaning of that study is, without statistics, compared to super smaller, super small sample size? Or do things still get blurry there?
Dale Wilson 08:23
Definitely not.
Paul Corsaro 08:23
No? Okay.
Dale Wilson 08:24
Like, you mean, like, would you be able to interpret like, large data from a, like, say that you have like a super large study, and it's got all these people but it doesn't have like statistics on it? Like, I think you'd have a really, really hard time pulling anything out of that.
Paul Corsaro 08:38
So the statistical models for these small studies doesn't necessarily make it is it doesn't make it as clear as a larger study with the same statistical models, right?
Dale Wilson 08:52
So I think where you are getting with this is like the confidence interval kind of mindset as well as to like, can you make conclusions with like, the same confidence from like a large study versus a small study? Heading in that direction and I mean
Paul Corsaro 09:06
Right
Dale Wilson 09:06
When you look at like the confidence intervals inside of small studies, they're much larger because when you think about it, it's like, how can you determine to a high degree of certainty, where like a true mean for something is if you only have like five numbers, but if you were to have like, 100 numbers in that same range, like you would be able to narrow that range down for where that true mean might be.
Paul Corsaro 09:26
Thank you for saying what I was trying to ask way better than the way I asked it. Haha
Kris Hampton 09:30
Hahaha. I was enjoying you struggling through that question, because I struggled through these questions in my head all of yesterday and half of today.
Dale Wilson 09:39
Haha. I was like, "Where's he going? I think, I think I see it"
Kris Hampton 09:42
Haha. Can you, Dale, so I think if if people have looked at research, they may have heard us talk about it a little bit in Season One of Breaking Beta. There's a value assigned in a lot of these studies, the p value. And this comes from statistics. And it's my understanding and I may not have this exactly right, but I think it came from psychology, which you just mentioned, being one of those areas of research that are a little tougher to pin down. And I think that's why this p value came about, is that correct?
Dale Wilson 10:22
I think we just pissed off all the climber psychologists that are listening to this podcast.
Kris Hampton 10:26
Haha
Dale Wilson 10:29
P values, I'm not sure if they came from psychology or where specifically they came from, but it's essentially a way of stating like, "What is the probability that the results that you're seeing inside of a study would have been generated if if your null hypothesis were true?". So that'd be like saying, like, whatever you're studying, like, and you're looking at, say, a difference between two groups, "What is the probability that whatever you are like trying to study, like has zero effect on that thing?" and you would have seen those results in that setting. So like, the standard value that's become kind of, not even industry standard, it's like, across across fields really, is like .05 and it's like arbitrarily determined, there's no like, really solid reason for that. Probably easiest to view it as just a probability, the probability of again, seeing those things and seeing those results in that situation.
Kris Hampton 11:26
Yeah, when I, when I've been looking at, you know, trying to understand p values a little better, there's quite a bit of complicated language around it. And I think you just described it in a really fairly simple way. I think there's a misunderstanding about the p value that it's saying, "It's this percentage. That chance is what caused this result". That's what I hear a lot of people say, and that's not quite true. It goes back to the null hypothesis, the way that you just described it.
Dale Wilson 11:58
Yeah, it has to it has to be in relation to assumptions about the experiment, like you always have to go back to that. Because it's, you can't ever really know like, what....that's an estimation. That's the other thing is it's like, it's an arbitrary percentage and it's an estimation. Like, obviously, you're making this conclusion because you don't have more data to draw upon to, like, say confidently for it.
Kris Hampton 12:23
Yeah. And if the p value comes out to lower than .05, that is determined to be statistically significant and if it's higher than .05, it's statistically insignificant, correct?
Dale Wilson 12:36
Right. Again, that's the that's the summation of it as it exists right now. But normally, when you read read a paper and you're into the the results portions of it, they'll give their like, actual p values as to whether it's .1 or something like this. And they can say, like, "Oh the results are statistically insignificant", but and then they'll go into more details to kind of define other statistics about what they're seeing inside of this study, which I think is really beneficial usually.
Kris Hampton 12:37
Okay.
Dale Wilson 12:41
Because like, say, in an easy example, if you were to read one paper, and it has like, like one portion is like a .1 p value, and you're like, "Eh, that's kind of close to .05", and then if you see like in another portion, it's like, like .06, you are like, "Oh, wow, that's like, very close, it's very, very close". And there might actually be more...if something is like very borderline, it's like .049 or something like this and then you see other results that are .001, or something like that, like the magnitude there is huge. So like, it's used as a shorthand for determining statistical significance, but you still have to go through and read the results to understand it and what, what everything being examined in the study means.
Kris Hampton 13:52
Got it. So it's not just a threshold. It's not, you know, "Above this number absolutely means one thing, below this number absolutely means the other."
Dale Wilson 14:01
I mean, it's statistics, I don't know about absolutely for any of it.
Kris Hampton 14:06
Hahaha
Dale Wilson 14:06
They're meant to be mathematical approximations of large groups that can't be measured, like that's what it's for haha.
Paul Corsaro 14:11
What's the saying? "All models are wrong, but some are useful"? Something like that.
Dale Wilson 14:15
Some are useful, yeah.
Kris Hampton 14:16
Haha. That that quote is gonna go on the Instagram post for this episode for sure. All right. Unless you think there's more to explain here, Paul, maybe we jump into a couple of the papers we looked at and you know, look at their statistical models and what the P values were and the significance and kind of break that down a little bit and the two that we chose are the the two Eva Lopez papers that book ended our Season One. So looking at the maximum added weight/minimum edge depth and then the second paper also looked at intermittent hangs. And I know you've looked over these. Dale, is there a specific place you want to start here?
Dale Wilson 15:00
If we want to go through their whole statistical methods, that seems kind of boring, but I mean, I'd be happy to go through that. And if there's anything in there that you're, like unclear on or that you think would be interesting to understand, we can go through that.
Paul Corsaro 15:12
I think it'd be cool to go over the, if you can simply explain the ANOVA processes, I think that'd be cool. ANOVA and the ICCs, yeah.
Dale Wilson 15:14
Okay. Yeah. Um, so ANOVA is an acronym for Analysis Of Variance. Essentially, it's a statistical way of analyzing a sample from different groups. And that can be different groups over different times with the same individuals, that can be like multiple, more than two groups at the same time. Some ANOVA variations have to be like three over time. But essentially, you're analyzing different groups, and you're trying to determine where their central tendency actually exists and if that central tendency is different between groups. So if you were looking at like a group before, like a certain training intervention, and then after a training intervention, then after detraining, like kind of what they're looking at, in these Eva Lopez studies, like you could determine like a range of where the true mean might be inside of that group and compare that to like, where the range of the true mean, might be for the next group, and so on in time. So that's, that's a very high level overview of what analysis of variance is. And there's a bunch of different types for it and there's a bunch of different, like modifications that you can make to it to make it more robust also.
Kris Hampton 15:14
We see that in a lot ofpapers.
Kris Hampton 16:38
Sure.
Paul Corsaro 16:39
Is that one of the more common ways people look at like, between or within group differences in studies?
Dale Wilson 16:46
Absolutely. Yeah. That's probably THE most. That's your that's your bread and butter for like, any sort of experiment analysis, I'd say. And what the....while p-values are still fresh in everybody's head, one of the things that stood up, it's kind of kind of standard, but I thought it was worth noting was, one of the modifications they make for the ANOVA in both of these studies is to use what's called Bonferroni Adjustment, which is a, there's a phenomenon where if you repeat, if you repeat testing over and over on like the same groups, you have like an increased probability, or an increased type one error rate inside of the study. And Bonferroni Adjustment like raises the standard to reach significance with successive like measurements. So as you increase your number of like, measurements, it takes it from like, "Oh, it has to be .05" to like, "Oh, it has to be .03", then even more, "It has to be .01". So it's, again, like a way of trying to fight that, like, noise out of the design.
Paul Corsaro 17:48
So without that, it could be potentially easier to reach that, you know, p value of significance without doing that.
Dale Wilson 17:57
Exactly. Yeah.
Paul Corsaro 17:58
Okay. Cool
Dale Wilson 17:58
If you were to like, do successive measurements of the same people, you're more likely to see things that aren't actually there.
Paul Corsaro 18:04
Cool. Sweet. So that was a good call on their ended to do that as well?
Dale Wilson 18:08
Yeah. For their stuff, I mean, I don't think the values they're talking about they would, they had to deal with it. But it's just something that they like, comment on it, where it's like, you see it in the statistical methods, you're like, okay, like this, this is an easy, easy way to go about things and they are trying to be robust. So
Kris Hampton 18:24
Question for you about this study, in particular and I mean, any of the really small sample sizes that we saw. I think this study ended up with like, nine people, is that right?
Dale Wilson 18:35
Yeah.
Kris Hampton 18:36
In two groups,
Dale Wilson 18:36
Nine people in two groups. I think a five and a four.
Dale Wilson 18:39
It is small
Kris Hampton 18:39
Yeah, so a tiny sample size. We would see a lot of comments from people, "Oh, only 10 people? Throw this one out the window, you know, it doesn't matter at all". We'd see a lot of that. And I'm curious, if we are looking at
Kris Hampton 18:55
Yeah. It's very small. If we're looking at these small sample sizes, does that somehow affect and is there an easy explanation to how it affects, how easily or how difficult it will be to reach statistical significance?
Dale Wilson 19:14
It will be way harder in a small sample size, I would say, um, because of the the confidence intervals for the means are going to be massive.
Kris Hampton 19:23
Got it.
Dale Wilson 19:24
Like, you're much more likely to have like, massive groups. So if you were trying to prove your alternate hypothesis that like this has definitely has like an effect and this group is different, then you're gonna have a really hard time if you're automatically like stretching those confidence intervals out to show that they're, like not overlapping, essentially.
Kris Hampton 19:44
Okay. Yeah, I wondered about that. I had read that it's harder to reach statistical significance in a smaller group, but I wasn't exactly sure why. So that makes sense
Dale Wilson 19:55
Absolutely. Yeah, if you look at the like formula for confidence intervals is based off like the mean. And then the like z score, which is essentially like a standard deviation, like amount of variance measurement. So, but the way that it's calculated, like the more variation there is like at or the smaller the sample sizes like that, those groups start to grow like very quickly. Like or the the range starts to grow. Like the your confidence interval top and bottom become much, much larger with high variation or very small sample size.
Kris Hampton 20:33
Okay, if we are looking at the results of the first paper, you know, comparing maximum added weight to minimum edge depth hangs, and in different sequences, the charts that we we saw with this paper showed pretty significant gains. That's, you know, they looked significant. And I don't even know if I'm allowed to use the word "significant" in a different context when I'm talking about statistics here, but I'm going to
Paul Corsaro 21:05
Someone out there just sensed it and started typing a comment on the post.
Kris Hampton 21:08
Haha. And, but they did not, I believe, see their P value get to statistical significance, is that correct in this first paper?
Dale Wilson 21:23
I want to say yes. I think.... okay
Paul Corsaro 21:26
It was .06 and then when they controlled for...... when they controlled for body weight, they ended up having a .016 I believe
Dale Wilson 21:36
Yeah, that's for the correlation between the strength test and the endurance test.
Kris Hampton 21:42
Right
Paul Corsaro 21:43
Right.
Dale Wilson 21:43
Um, so. So when we're looking at their like, all the conclusions, or the results that they have before that, they're looking at purely change in external load, I'm pretty sure.
Kris Hampton 21:55
Yeah
Dale Wilson 21:55
Like, that's not controlling for body weight at all. And then they wanted to do this correlation between their strength test and endurance test values, and they get like a .06. So again, not statistically significant by the .05 value, but then they break it down, where it's like, if you control for bodyweight inside of this, it becomes statistically significant very, very quickly. And they see that same phenomenon throughout the rest of the.... they performed the same analysis for the other strength and endurance tests as well at the different phases in the study.
Paul Corsaro 22:30
And from just digging into it, just from what I've done, kind of prepping for this episode, kind of reviewing some p value stuff and all that, would you say this almost like it's it makes it even harder to discount it, even though it comes out at .06, because they added some of those additional statistical tests to kind of control for that? So they could have, you know, I think the term that gets thrown around for that is "p hacking", where people will, you know, manipulate things they do in their study, so they can get below that .05 value. It seems like they tried to fight that a good amount. So
Dale Wilson 23:04
You could make an argument that it's like, "Oh, they're trying to like justify how they would modify this to get below that threshold", but I think they are applying like, good, like critical reasoning here. Where it's..... we intuitively know as climbers, that's it is like, "Oh, that should include your body weight as well" and like, that's what you're curious about. Like, I was surprised when I first read this, I was like, "Oh, they're looking at, like external load for this. And it's like, this should really be like a total, like total load, operating, as opposed to like just external." So It completely makes sense for me to or it make sense to me for them to do that. Like, I think that's a very rational extension. And then they go through to show it from like test to test to test, or point in time to point in time to point in time to show that like that the significance of this is high and it's repeatable over and over. Like, throughout the experiment, we see that relationship stay true. It's not a fluke, it doesn't like exist at one time then disappear. Like I think it's pretty, pretty robust way that they go about investigating that portion.
Kris Hampton 24:09
Okay. Was there more that you saw that was interesting in this paper, Dale?
Dale Wilson 24:16
This is in a both of them. It's one of the things that I actually really like about this, and you see it in sports science all the time, I know it is pushed in the CSCS like text and education is the concept of like, effect size. And, man, for a lot of people that work inside of like sports science or something, if you're working as a coach and you don't have access to like statistical software or something, like using effect size is great. And it's a real easy way to like look at some variation in the group and, but also just be able to, like, make quick conclusions about like, what worked, what didn't. I think that's yeah, I think that's a good thing to be included in there. Like analysis of variance is something that you're probably going to need. I think you can do it in the...yeah, you can do it in the like data analysis tool pack inside of Excel, but usually you need some sort of like, beyond that you would need something like SASS or using R, something like that, to do like robust statistical analysis for it. But like effect size is something you can knock out in Excel in just a couple minutes and as a coach, that's pretty cool.
Kris Hampton 25:20
Can you define effect size for me? Like, what are what are we talking about exactly when we're when we're saying effect size is different than, you know, the, the significance of the results?
Dale Wilson 25:34
So effect size, like for a group, if you're doing it in groups, would just be taking like the mean before and the mean after and then taking the difference between those two and dividing it by the pooled standard deviation, so everybody in everybody in those groups and that's it. So you end up with a with a value that should be like, one standard deviation would be one.
Kris Hampton 25:57
Mmm hmm
Dale Wilson 25:58
Like, would be like, so that would be like a really large effect. They even give like quantified ranges inside of the paper for like, "What's a small effect?", like "What's a large effect?", "What's a medium?" and those are, those are standard also. So it's really, again, it's just like a quick and dirty way to see, to quantify, like, how much of a change you're seeing. So I like seeingit inside of sports science, because it's something that's like a useful tool to give people.
Kris Hampton 26:27
Yeah, I think it's important. I think it's important to have that in there. Most practitioners of you know, who are working with athletes, are not statisticians. They, they don't fully understand all of this, myself included.
Paul Corsaro 26:45
I was gonna say, I don't fully understand all of this.
Kris Hampton 26:47
Haha. Yeah. So. So having something that, like you said, is a little quick and dirty and we can just go in and look at it and see what the effect size looked like, that, frankly, is a little more important for me than really paying attention to what all the statistics say.
Dale Wilson 27:09
Yeah,
Paul Corsaro 27:10
It's, it's interesting too, just kind of going back on just the the caption in the data table, how they used a slightly modified scale for what was small, moderate, and large, because they're highly trained individuals. I'm just going back...so when we started talking about an effects size, I dug up a textbook that I have, that kind of talked about the effect size and all that, and I guess the standard was .2 or less is small, .5 is moderate, moderate and .8 or more is large, not considering trained individuals. That changes a little bit with how they use that here.
Dale Wilson 27:43
I think that makes theirs like even more, like more difficult to
Kris Hampton 27:46
More stringent,
Dale Wilson 27:47
Yeah, more stringent to get like a large result. Because yeah, yeah, I just think effect size is a great, simple, simple thing you can do in a couple of minutes and understand something in a slightly statistical setting. But it's not overly complicated.
Paul Corsaro 28:03
I like that. It's a good, applicable thing.
Kris Hampton 28:06
Anything else in here or in, you know, that like sort of shows up in both papers that, that you find interesting that Paul and I are likely overlooking?
Dale Wilson 28:19
I'm looking at the table from the second paper, which is the 2019 I think it was, the endurance test base paper. That was one of the things were like effect size, I thought like really, really stood out. I thought that was great. For the... I think it's the endurance test, the effect size for going from endurance test two to endurance test three is a full standard deviation.
Kris Hampton 28:47
Mmm hmm
Dale Wilson 28:48
And it's a yeah, score of one. Like, that's, that's huge. So I thought that was like, instead of looking at, like, the confidence interval or the, like the mean plus or minus the standard deviation for those, like, if you're looking at that, you kind of have to think about it. But like, when you look at the effect sizes, it's a really quick way to be like, "Wow, that's a lot." Like it's a really significant improvement. I think one of the things I thought was kind of funny was that, at the endurance test one point, in that same paper, if you were to look at just like the mean plus or minus standard deviation values from those groups, and they say like, "Oh, there's no, there's no difference between the groups.", but if you were to just look at that table, you'd be like, "Oh, the max hangs group and the intermittent intermittent hangs group have like the same mean" and you'd be like the max hang two intermittent hang group in the middle like you're like, "That seems way higher." But the standard deviation is much larger, so it kind of make sense that just as like a quick gut check as to why they say that. Originally when I looked at it, I was like that means massive standard deviations, massive also. So yeah.
Paul Corsaro 29:58
So if you are looking at those mean numbers, you've definitely got to take into account what comes after that plus or minus?
Dale Wilson 30:04
Yeah, it's hard to turn to look at one number, look at another number and make conclusions for populations. But if you include standard deviation, you're like, "Okay, there's a little bit of understanding about variance that's seen inside of the group". I think those are those are kind of the big things from the paper. I love that they use the same, like the same exact statistical analysis between them. I thought that was great.
Kris Hampton 30:25
Well, I think it's a you know, it's partially a byproduct of it, maybe not a byproduct, maybe that's not the right word, but Paul and I have talked about this quite a bit throughout Season One, that one of the things we really like, that we see in science is someone asking a question, and then asking the next logical question, and then the next logical question and continuing down that rabbit hole, rather than throwing everything out and starting over with a new question.
Dale Wilson 30:56
Yeah
Kris Hampton 30:58
And that's something I really appreciate about what Eva is doing here. And I think the, you know, continuing the same statistical model, the same stringent stringency that she applied to the first study she's also applying to the second study, I think, is really, really smart science as I understand science.
Dale Wilson 31:18
Absolutely. Um, I did want to jump back...hahaha...I do want to jump back to one thing that I thought was really cool, in the same world of like, good things that science does, like, obviously, results need to be repeatable. Like, you need to be able to replicate like the same results and one of like someone else performing the study would need to get something similar. And one of the things I thought was really cool of her comparison between the strength test and the endurance test and just running a Pearson's correlation coefficient, so the correlation between those two values. We have that same data from our own assessments, using the like max hang, strength to weight ratio, and the continuous hang measurement that we do and those are also co linear in our data. We see really, really, really good correlation between those and we use it for like, different diagnostic points inside of the assessment. But it was just cool to be like, "Oh, yeah, we we see the exact same thing".
Kris Hampton 32:16
Yeah,
Dale Wilson 32:17
It's like, awesome. I'm not the only one seeing this.
Breaking Bad Audio Clip 32:19
Yeah science!
Kris Hampton 32:20
I really like it when, when the things that we're doing are validated.
Dale Wilson 32:26
Yeah, and it's one thing to say that like, "Oh, that does make sense", like, you can logically reason in your head to be like, "Okay, if you're stronger, you can be able to hold on to this edge for longer. That makes sense to me". It's like, but you got to watch yourself, and you got to actually, like, prove it out with some data. You don't just get to run around with hypotheses and be like, "I thought this, so yeah, it's probably true."
Paul Corsaro 32:47
This seems like it would make sense, so I'm gonna go this route.
Dale Wilson 32:49
Yeah, like, you have to... you gotta prove yourself a little bit, or prove it to yourself.
Kris Hampton 32:54
Yeah. I've got a quote here, Dale that I would like to read. It's from an article on the Simply Faster website about statistical significance. And I'm just curious to get your take on it and yours as well, Paul. "An effect can be statistically significant due to a large sample size, but have no real world effect. Conversely, an intervention can have no significant difference in terms of statistics, usually due to a small sample size, but have a large real world effect. More pertinently, when comparing two different interventions, the difference in p values between them doesn't really tell us anything about the magnitude of these effects, which is more important."
Dale Wilson 33:38
No I think so I think it's going back to kind of the, like, P value gaming aspect, and how much do we rely on the... something reaching that statistical significance threshold and how much does that mean? I think the biggest challenge with it or problems with it would be like what it pushes people to in science, which is getting away from, like designing good experiments and embracing their results. It does kind of like, force that kind of mindset shift where it's like, "Oh, it has to be significant, otherwise, it's not worth publishing". Or, like, if you anticipate that, like your sample size, so your sample size is gonna be smaller, like "It's not gonna be significant. We have to do an entirely different experiment" or something like this. Um,
Kris Hampton 34:30
You know, it occurs to me that almost every system...you can build the best, most perfect system in the world and it's likely going to fail due to human input into the system. You know, many of the most successful hackers look first at just contacting a human who will let them into the system rather than trying to break in via code. You know, it's one of the easiest ways to hack a major company is just talking to a human. And that happens in science as well. We have our own human biases, we have human error. Research is expensive and researchers want to show a result, so that they can get published, so they can get more money, so they can do more research, you know. It just makes complete sense that they would want that. And I think as soon as you put human needs and desires into the equation, it's always going to have bias. And no matter whether science is right and wrong answers only, you add humans into the equation, and there's gonna be some gray area and some iffy-ness.
Paul Corsaro 35:43
And I think you know, there's actually a sentence in the second paper that kind of explains.... that kind of is an example of that quote. It was...let's see, so the difference is, they're talking about the significant gains in the grip endurance and how they said, I'm sorry, not significant.
Paul Corsaro 36:02
But the largest gains in one of the groups for grip endurance are probably not significant due to small sample size. But at the same time, we could still get some real world information out of that.
Kris Hampton 36:02
Haha
Kris Hampton 36:14
Right.
Paul Corsaro 36:15
I think that's an example of kind of, yeah, large, small sample size, not significant, statistically, but that doesn't mean there's not information in there and that doesn't mean we can't use these to improve either our performance or someone we're helping improve their performance.
Kris Hampton 36:29
Yeah, well, it's not, it's not lost on me at all that, you know, some of the same people who are complaining about small sample sizes, and P values that don't show significance are also the same people who will say, "Well, I did this and it worked, so therefore, that must mean this is correct".
Paul Corsaro 36:48
Haha
Kris Hampton 36:49
You know, I personally, if I see a result that someone has gotten, even if it's one person in a study, if I see that this worked really well, for them, I'm going to look deeper at that. I want to understand why it worked well for them. I may not come to the right conclusion, but it's still something interesting for me to look at, and maybe take forward to help the athletes that I work with.
Paul Corsaro 37:15
Yeah, why not explore it, you know? Like, there's there could be something there.
Dale Wilson 37:20
I think like, that's a really good point. I think it kind of plays into when you're reading, like when you're reading the whole paper for these two studies, for example, or any study, like, you don't get to, like, just hop to the results of something and be like, "Title said, 'Max hangs' and this says that these results are like they saw the biggest group, so or biggest improvement so like, thus, max hangs are the best for something ". You don't get to do that. You still have to go through the work of like, going through the paper. And when you do, that's where like the important parts come out to you. And they don't like make themselves, you're not going to get like anywhere near the value out of it by just like skimming for like, "Was it this or was it that?". Like, you can't do the like binary thinking portion of it. You got to look at like, "How was the how was the study built?", "How would you build it differently?", "If you were to repeat it, like, what would you change about it?", "What did you think about their methods for evaluating and what did you think about like the edge they were using for this?", "How would you change that?", like, "Is there another obvious group that you wish had been included in the study that you that wasn't moot?", and like going through, like, piece by piece, and when you're doing that you're actually going to come up with way more satisfying and interesting answers and problems than just like a "Yes/No", like, "This one works. This one doesn't". Like, I mean, I don't know. There was like a real dark period in climbing probably like four or five years ago, where it was like, maybe even longer than that, where it was just like, "Which is better repeaters or max hangs?", and you couldn't get away from that question.
Kris Hampton 38:50
Haha
Dale Wilson 38:50
You couldn't get away from it. It was everywhere.
Kris Hampton 38:53
Dale, I'll tell you what, Nate and I went to a presentation by Eva. And after the presentation, after she had done an amazing job of explaining that there is no best protocol, she took questions, and I bet you the first seven questions were "Well, but how much weight do you suggest hanging with?" and "Are, you know, would..should we hang three times or five times? Which do you think is better, six seconds or eight seconds?" And you know, I was just mind boggled because these were really intelligent people asking those questions,
Dale Wilson 39:31
And I'm sure she handled it like a professional because
Kris Hampton 39:34
She was a champ
Dale Wilson 39:35
The foremost expert on this stuff, but like, yeah, like she knows that this is not like it's not black or white. It's not cut and dried. It's not simple. Like she's built entire studies around like, what climbers to choose for each of these groups, who would like....she's just the expert on it, and isn't looking for like short answers like that. And I think you have to like set that kind of as your goal when you're thinking about these things and like how do the best do it? And it's like Eva Lopez does this as her job.
Kris Hampton 40:05
Yeah
Dale Wilson 40:05
And thinks about everything that goes into this. What's a robust like study design? What are the different groups that you'd want to do in it? And you have to, like, apply that same kind of critical thinking to your own training. And for your clients, the same thing where it's like, you can't just yeah, jump to results and it says, like, "Oh, yeah, in this one paper, like by Eva Lopez, it says, they did this, and it was better. So yeah, max hangs. Always max hangs."
Kris Hampton 40:30
Haha
Dale Wilson 40:30
"What if I just finished doing max hangs?" Probably more max hangs.
Paul Corsaro 40:33
Haha
Kris Hampton 40:34
I mean, it would be great if you could, but really, that isn't reality. And it can get frustrating. You know, we, we want to look at research that took a long time, that cost a lot to do and we want to derive from it the answer. You know, I'm sure the researchers want the answer too, but that just isn't how it works. You you're not going to get the answer immediately. It's... you're going to have to go down a rabbit hole for quite a long time.
Paul Corsaro 41:05
And if you do that, you'll be better off for it, for sure. It's just not the easy path.
Dale Wilson 41:09
Absolutely.
Kris Hampton 41:11
And you can look at sports science in general, there's there's been this controversy, big argument going on in sports science the last few years, quite a number of years actually, about statistical models, which is the best to use. There's been this MBI (Magnitude Based Interference) has been pitched and used by some researchers and, and other researchers are saying it's a completely useless thing. It's it's made up, it doesn't hold up mathematically. And there's a lot of argument about that. And I think if you know, the the general sports science world, and the general medicine science world can still be arguing about the statistical models that are best to use, then thinking that you're going to get an answer from, you know, the answer from one paper is a little bit insane.
Paul Corsaro 42:12
You know, in the widely massive, largest sport in the world, that is rock climbing....like
Kris Hampton 42:16
Hahaha. I've got a,
Dale Wilson 42:21
It's incredibly well funded.
Paul Corsaro 42:22
Haha. Yeah, well funded. Most researched out of all the sports out there.
Kris Hampton 42:25
Exactly. Haha. I've got a quote from an article in the journal PLOS One by Keith Lohse, in a paper called "Systematic Review of the Use of Magnitude Based Interference In Sports Science and Medicine", that I thought was really great. He looked at all the papers he could find that use this MBI model to try and understand it better. And he says, "Amidst debates over the role of p values and significance testing in science, MBI also provides an important natural experiment. We find no evidence that moving researchers away from P values or null hypothesis significance testing makes them less prone to dichotomiezation, or over interpretation of findings"
Paul Corsaro 43:12
Hahaha. I like that.
Dale Wilson 43:14
Publish or perish, this is how it goes.
Kris Hampton 43:16
Hahaha
Dale Wilson 43:17
Pretty on the nose.
Kris Hampton 43:18
Anything we're missing here, Dale?
Dale Wilson 43:21
I think, if people are curious about learning more about statistics, there's a bunch of great books out there and there's a bunch of free resources online, none of which I can think, to the top of my head right now. But I could make a list and we can throw it at the end of the episode, something like that.
Kris Hampton 43:36
Yeah, let's do that. I'll have a list of things in the show notes for folks and on the blog post at Powercompanyclimbing.com/breaking-beta.
Dale Wilson 43:46
Okay, yeah, I think, I think having a basic understanding of statistics, even if it's not something that you use every day, or anything like that, I think it's really powerful. It'll change the way that you think about problems and how you think about results. Yeah, and hopefully make you ask better questions in your work, in your life, in rock climbing. I mean, it's pretty powerful stuff and they can yeah, definitely change how you approach problems. So
Kris Hampton 44:12
Awesome. Well, thank you for taking the time to sit down and chat with us. I'm so happy that you're our first guest. This won't be your last appearance on Breaking Beta. I'm fairly sure of that. I'm glad that you're on this team, and we have you to lean on when it comes to this stuff.
Dale Wilson 44:31
Glad to be here. Thanks for having me.
Kris Hampton 44:33
You can find both Paul and I all over the internets by following the links in your show notes. You can find Paul at his gym Crux Conditioning in Chattanooga, Tennessee, though he's not there as often these days because he's, he's moving up. We've got more people working for him.
Paul Corsaro 44:49
We got a great team at the gym. It's great.
Kris Hampton 44:51
I love it. You can also work with Dale through powercompanyclimbing.com Or you can buy a Mini Assessment and let him crunch the numbers for you, which he is quite good at. And if you have questions, comments or papers you'd like for us to take a look at hit us up at Community.powercompanyclimbing.com. Don't forget to subscribe to the show. Leave us a review. Please tell all of your friends who believe that p values and statistical significance are the only things that matter that you have the perfect podcast for them. And we will see you next week when Paul and I discuss how we choose papers, how we read them and whether or not you need to go beyond the abstract to get the gist.
Paul Corsaro 45:34
Thanks, y'all. We'll see you guys next time.
Breaking Bad Audio Clip 45:37
It's done.
Breaking Bad Audio Clip 45:38
You keep saying that and it's bullshit every time. Always. You know what? I'm done.
Breaking Bad Audio Clip 45:45
Okay, you and I? We're done.
Kris Hampton 45:49
Breaking Beta is brought to you by Power Company Climbing and Crux Conditioning, and as a proud member of the Plug Tone Audio Collective. For transcripts, citations and more, visit Powercompanyclimbing.com/breakingbeta.
Breaking Bad Audio Clip 46:03
Let's not get lost in the who what and whens. The point is we did our due diligence.
Kris Hampton 46:09
Our music, including our theme song Tumbleweed, is from legendary South Dakota band Riff Lord,
Breaking Bad Audio Clip 46:15
This is it. This is how it ends.
Kris and Paul dig into a paper that presents and then tests a method for measuring movement skills in sport climbing.