Questions

Questions

Luisa Andreuccioli -
Number of replies: 3

Regarding PREBO application: 

- For how many years is it usually advisable to keep the data? (Asking since it needs to be specified as part of the PREBO application)

Regarding Ethical training: 

- I am slightly confused about the best practice to anonymise participant's identifiable information. In my research I will need to keep information regarding participants' sex and date of birth. What are advisable practices to do this without violating any potential confidentiality principles? 

- Somewhere where dissemination of research findings was being discussed, it was mentioned that research findings should be published (or disseminated) even in cases where they do not support the initial research hypotheses. I am aware that, in recent years, there have been attempts to encourage this practice in our field -- for example, more people in the field have been talking about the importance of doing so, and I believe that the need to pre-register studies was born with this goal in mind. However, aside from some failed replication of certain studies (mostly well-known ones), it is very rare to encounter publications of results that do not conform to initially stated research hypotheses in our field, and I believe that there is still a strong bias to only publish positive results both among researchers, and in terms of how often these papers are considered worthy of being disseminated by research journals. I strongly believe that publishing papers in which the results do not confirm the hypotheses is an important part of 'doing science'. Why may it be that, despite attempts to encourage this practice, it is still rare that this happens? 

In reply to Luisa Andreuccioli

Re: Questions

Christophe Heintz -
The tradition has been to keep the data for ten years. Erno is not sure about the rule.
Best practices involve putting the anonymised data set on OSF or other publicly available place for reproducibility. (same results with the same data set; it is different from replicability, which involves that the same results are obtained with a different data set).

What type of data can be stored: the general idea is that you do not store data that allows to identify the participants. So you can store age and sex.
Now, there might be some trade off to make. For instance, you might need to keep the very birthday, from which identification is easier. In those cases, you need to be especially careful to give a code to separate the data about the participants (who they are) and the data about what the participants did during the experiment. More generally: if you can't make it impossible to associate data with individuals, make it stupidly difficult. The standard means is to have two data-set that are stored on different safe place and that need to be assembled to de-anonymise.

PUBLISHING NULL RESULTS
There are several reasons for publishing null reasons. The main one could be the file drawer effect: imagine the very same experiment being done ten times. One of them gave significant result, the others are not published. There is a much higher probability that the published result is a false positive than what the paper will indicate (with, e.g. a p value). Note that they will do that in all honesty if they do not have the data from the other studies. The general consequences is that the literature becomes crowded with false positive and the meta-analysis of the data, which could solve the problem, is not possible.
It also seems inefficient because one fancy experiment might be run many many times by different labs without any one realising that, well, it gives null results.
There are genuine efforts to solve the file drawer problem. One of them is preregistered report: the researchers submit first the introduction and method, which are reviewed by the journal without knowledge of the results. The journal then decide to 'commit' to publishing the paper with the result section and conclusion added (there is a second round of review of these sections, though). With this method, you can either gather and analyse the data after submitting, but you can also do it before submitting.
Another effort is to just agree to publish good papers and put little weight on whether the results are significant or not. You even have the journal of null results.
My own opinion on the matter: In spite of the above argument, I think it is justified that journals give preference to papers that report significant results and tend to reject papers with null results. The main reason is that no serious conclusion can be derived from null results. Remember--it is quite important--that absence of evidence is no evidence of absence; and null result is just absence of evidence. And there are already so many papers to read! To me, the solution comes with preregistration and making the data available on platform such as OSF. This allows for meta-analyses without crowding the scientific journals with studies that were just not able to reduce noise.
In reply to Christophe Heintz

Re: Questions

Shubhamkar Ayare -
> The main reason is that no serious conclusion can be derived from null results. Remember--it is quite important--that absence of evidence is no evidence of absence; and null result is just absence of evidence.
Would Bayesian analysis be useful to distinguish between the two? Or am I overestimating what Bayesian statistics enables us to do?

> To me, the solution comes with preregistration and making the data available on platform such as OSF.
That makes me wonder if the null results of studies are available to search engines, either google scholar or something else? Is there a way to search for them while performing literature reviews?
In reply to Shubhamkar Ayare

Re: Questions

Christophe Heintz -
Bayesian analyses: please do ask Jozsef. I would love to know what he says.
My current opinion is that, while the tools of Bayesian analyses do have some important added value in revealing important patterns and can importantly contribute to interpreting the data, there will always be questions about how to best use the tools, whether Bayesian or Frequentist.

Searcheability and accessibility of data set: as far as I know, there is yet no good tools for that.