This is the write up from our 9th live chat, and it’s a bit of a special one. We invited epidemiologists Rob and Suzi, who we already spoke to a couple of weeks ago, to come back and give us a few more tips on how to design experiments, now that we are knee deep in this sort of business.
For those who haven’t been following our mission closely, the Nappy Science Gang is in the process of designing their own experiments to find scientific answers to these three questions: 1) what is the optimum temperature to wash nappies at? 2) what detergent type gives best results 3) should we strip-wash our nappies and if so, how?
In this chat a few of the people who are looking at each of these questions had a chance to ask Rob and Suzi for some help and tips. Read below the write-up of what came out of this!
Q: I’m in the strip-washing group and we want to establish what the best method for strip-washing is. If we have a few different methods to test, will we need to get people to try all of them to get some replication?
Suzi: You can do ‘repeated measures‘, where each person tests for every condition (though ideally in a different order). This method is good if you only have a limited number of people. If you’ve got lots of people, you can randomly assign them to one particular condition, and hopefully this will mean that the differences between washing machine, material etc will be randomly split between the groups too.
Q: What kind of numbers would we need to be able to do random assignation?
Suzi: It’s hard to say what kind of numbers, because that really depends on the (statistical) size of the difference you’re expecting to see between groups. I think repeated measures would work well, and give you more power.
Rob: I agree with Suzi, but repeated measures may also complicate the analysis as it would need to be controlled for. Just to explain a complication with repeated measures: because the characteristics of, for example a washing machine, or water at one location may be similar, you would need to take account of this similarity to be able to really understand the effect of ‘strip washing’.
Q: A problem we are having with strip-washing is the need to make a subjective judgement such as smell. Do you have any recommendations regarding this?
Suzi: If you’re going to measure objective things like smell, you really need blinding. So, whoever washes the sample shouldn’t be the one doing the smell judgement. That’s really important for objective measures.
Rob: One thing you could do is also smell it yourselves at the end and see if your ‘unblinded’ score was the same as the blinded score.
Q: Could we ask someone else to smell a couple of nappies, one that we think is in need of strip washing and one that isn’t, to see if they agreed on the same one? Then we could do the same test afterwards and ask them to smell a selection that didn’t need strip-washing and some that had been strip-washed and see if they could tell them apart?
Rob: Yes, you could do that, but it would require a fairly complicated analysis.
The group then agreed on the fact that finding the definition of “what is in need of strip-washing” and how do you decide if it’s necessary, is going to be one of the most interesting things to come out of this test. Suzi added that we could do some really nice qualitative work on this topic, using interviews and questionnaires etc.
Q: I’m from the detergents group and we’re trying to work out sample sizes to see which detergent/ washing agent gets nappies most clean. Will we need a very large dataset?
Rob: In general, large datasets are required when what you are measuring doesn’t change much between the start and the finish. So if you are measuring the relative effectiveness of each detergent and there is only a slight difference between the results given by different products, that’s when you need lots of samples of data. The same goes for the strip-washing question: if you are testing different methods for strip-washing, but the end results vary only a little, you will need more experiments to really detect a significant difference.
Suzi: Yes, I agree, since what’s important is the relative effectiveness of each detergent, if there’s not much difference between the detergents tested, that’s when you’ll need lots of samples of data.
Q: So potentially we are going to need quite big samples!
Suzi: Potentially yes, although if there’s little difference between detergents, then that’s interesting, and good to know. One less thing for parents to worry about!
Q: Say that we will be testing 3 different strip-washing techniques, over two fabric types (eg cotton and microfibre), what is the best way of designing our experiment?
Suzi: That would be a 3 x 2 design. If it was all within subjects, then each person would need to do 6 washes to cover all the conditions, but you might want to analyse each fabric separately, to simplify things a bit. 3 x 2 is because you have 3 different washing methods and 2 types of fabric. So you need to do each washing method to each fabric – meaning 3×2=6. If (for example) you also wanted to look at 2 different types of washing machine brand as well, this would become 3x2x2=12 (though then it starts getting really complicated!)
Rob: So to be able to answer the question you first need to decide how you’re going to measure smell and absorbency. Then you need to decide what an important difference would be in smell and absorbency. Only then, once you know these two things, it gets a bit easier to answer the question of how many tests you would need to do
Q: Our plan is to use a number of blind (as much as possible) detergent samples labelled A, B, C etc sent to a number of volunteers. I guess we have a similar question to the strip-washing one, as in how many different volunteers would we need?
Suzi: The same points that Rob made also apply to your question about number of people. there’s no ‘magic number’ answer I’m afraid!
Q: I think some of the difficulty we’ve got is that we don’t know what difference things might make, because nobody’s ever tested them before. So we want to see what difference repeated washing at different temperatures makes, but we don’t know how much difference there’ll be (1) between the start and end of a test at any temperature or (2) what the difference in results at different temperatures will be. it could be nothing or could be really clear, we just don’t know.
Rob: Yes, that’s a fairly common problem, in which case what you might want to do before the randomised trial is a pilot study. In the pilot study you can do things like test your outcome measures of smell and absorbency, wear and tear or anything else you might want to test for. For example, if you’re measuring smell on a scale of 0 to 10, in the pilot you might decide that what you want to find is detergents that improve smell by a score of 2. And you can, very reasonably, do that part unblinded.
Thank you so much to Rob and Suzi for coming back a second time for a live chat with us, we really appreciate it!
The chat has definitely got the group talking some more about experimental design, which is great as we only have a few days left to come up with a draft…