We are strongly devoted to the principles of reproducible research:
Every step of the analysis from data import over transformations up to publication-ready tables and figures is fully traceable, based on R scripts.
Before any data is generated, hypotheses need to be formulated in a way that leads to a clear decision path. Based on these hypotheses, a statistical analysis plan for hypothesis testing is to be developed. This step unfortunately is often skipped in academic research, leading to statistical fishing expeditions and ‘p-hacking’ in attempts to find ‘significance’ and resulting in non-reproducible findings. To overcome these problems, detailed statistical analysis plans and pre-registration are increasingly demanded by funding agencies. Procedures, that are standard in clinical trials, are finding their way into basic research.
Once the statistical analyses have been defined, sample size for data acquisition needs to be estimated. We provide this service as an independent partner as often required by ethics committees. This includes counseling as well as computations based either on theoretical assumptions or simulation approaches.
Data analysis usually start with descriptive statistics. This includes informative figures for inspection, detection of errors and extreme values, and forms the basis of later plausibility checks for statistical results tables. Tables, formatted ready for publishing, are another integral part of data description.
Based on hypotheses and statistical analysis plan, hypothesis testing with classical statistical methods or highly complex bioinformatic approaches turns raw data into actionable information. This may e.g. be a group comparison to find risk factors or treatment effects, evaluation of diagnostic properties for biomarkers or new technology, or may disentangle interactions in complex systems.
This part of the analysis can be prepared while data is still acquired. This allows for customization of output formats and timely delivery of results. Every step of the analysis from data import over transformations up to publication-ready tables and figures is fully traceable.
For recurring analyses, e.g. in the context of quality management, the whole process can be fully automized, allowing for pre-scheduled report generation as well group-specific customization. This has been implemented e.g. together with samedi for DeGIR, providing almost 300 radiology departments bespoke comparative analysis against the overall pool.
Data mining / machine learning
Data analysis does not stop at hypothesis testing, usually there is a wealth of information that may guide future hypothesis generation. If learning from the data is more important than testing for significance, machine learning approaches may be a suitable alternative to more conventional statistical tests. We assist you in finding the right methods, computation of models, as well as in achieving explainability by increasing transparency of AI models and translation of rather obscure technical output into interpretable language.
Approaches we have been applying successfully in the past are focused on decision trees, e.g. regression trees or random forests.