Keep up to date with the current developments at LADAL!

Below you will find information on and links to the latest developments at LADAL such as updates to the LADAL website, upcoming workshops and presentations, planned events, and links to resources.


2020/12/15: Statistical excellence - Stefan Th. Gries

We are really pound and feel both honored and privileged that Stefan Th. Gries has agreed to contribute to LADAL! Stefan has played an outstanding role in promoting statistical and computational skills in the language sciences. Stefan’s textbooks Statistics for Linguistics with R – A Practical Introduction and Quantitative Corpus Linguistics with R: A Practical Introduction, but also with his research output and bootcamps have had a tremendous influence on many linguists working with empirical data!

2020/12/02: Summer Scholar program support for LADAL

UQ’s summer research program supports LADAL: four junior academics will assist LADAL, acquire new skills, and produce new materials. A very warm welcome to the summer scholars and we hope that they get as much as possible out of the program and provide great new materials!

2020/10/09: VARIENG - LADAL becomes more Finlandish

The VARIENG at the University of Helsinki has agreed to affiliate with LADAL! VARIENG is a perfect affiliate due to the similarity in the outlook and the alignment of aims of both VARIENG and LADAL. In addition, we are super happy to have VARIENG as an affiliate institution because of their extremely high scientific merit!

2020/10/07: DDL expertism and Phylolygisms

Erich Round - director of the Ancient Languages Lab and world-reknowned phylogenetics expert as well as recipient of the British Academy Global Professorship - is now officially a contributor to LADAL! His expertise in R and phylogentics fantastically complement our skill set here and we are more than going crazy for having him on board!

Also - and on an equally enthusiastic note - Peter Crosthwaite, the foremost proponent on Data Driven Learning in Australia, has agreed to be a LADAL affiliate! Peter is not only a fantastic second language acquisition scholar but he has probably one of the best overviews of existing software in this domain and is amazingly well versed in finding the right applications to do awesome research!

Welcome on board!

2020/10/05: Big in Japan

Laurence Anthony - the royal highness of AntConc empire - has agreed to be an affiliate of LADAL! We are so glad to have him on board as Laurence is not only tech savvy as few others in Corpus Linguistics but also because Laurence is overall wholesome and a fantastic promoter of computation in HASS research!

2020/10/02: Library Excellence!

Stephane Guillou has agreed to be a contributor and affiliate of LADAL! That is really fantastic not only because Stephane is all-around awesome and a true R wiz but Stephane is also directing the upskilling efforts in R, Python, and Git at the UQ library and thus brings along a fantastic skill-set!

2020/09/29: Syndey Corpus Lab collab!

Monika Bednarek who is running the Sydney Corpus Lab at the University of Sydney, has agreed to be an affiliate member of LADAL. This is perfect for LADAL given Monika’s expertise and excellent research in Corpus Linguistics as well as the close alignment of the Sydney Corpus Lab with LADAL!

2020/09/25: LADAL goes Finland!

The news that LADAL exists has reached the other side of the globe: Martin was invited by Mikko Laitinen from the University of Eastern Finland to give a guest lecture about his experiences in establishing LADAL in the context of an event about developing support infrastructures for computational social sciences and humanities research.

2020/09/22: Interaction!

We have decided to include interactive exercises into our tutorials and we are currently looking into different options how to achieve this. Currently Binder appears to be a viable pathway forward.

2020/09/15: They multiply!

We are delighted to announce that Katy McHugh, Stephen Clark, and Restuadi Restuadi have joined the LADAL team!

Katy, Stephen, and Restuadi will be involved in the restructuring, professionalizing, and revamping the LADAL webpage. We would like to extend our warmest welcome to them and express our gratutude to the School of Languages and Cultures at UQ for providing the funding for the RA positions.


The LADAL team organizes workshops and LADAL members present their research or information relevant to LADAL at conferences. Below are links to upcoming events (conferences/workshops/presentations) and presentations containing information about LADAL or research based on LADAL.


LADAL Opening Event

The opening event signifies the official kick off for LADAL. Originally this kick-off was planned for June 2020 as a 5-day conference with an invited speaker (Stefan Gries), workshops on data science, and social events. Unfortunately, this kick-off had to be postponed due to COVID19 and will be held as an online event.

Here you will find updates and the current state of plans relating to the LADAL opening.


Best Practices in Corpus Linguistics – What lessons should we take from the Replication Crisis and how can we guarantee high quality in our research?

Speaker: Martin Schweinberger

Date: 20–24 May 2020

Presentation at ICAME 41 (41th Meeting of the International Computer Archive of Modern and Medieval English). Heidelberg, Germany.

Materials: slides, video

Abstract: This paper addresses issues relating to best practices in Data Management and Data Analysis in Corpus Linguistics (CL) and offers guidelines for compiling, storing, handling, and analysing data according to best practices which guarantee transparency and high quality in CL.

Open Data and Best Practices in Data Science are increasingly attracting attention as a result of the so-called Replication Crisis (RC) which is an ongoing methodological crisis primarily affecting parts of the social and life sciences that began in the early 2010s (Diener & Biswas-Diener 2019). The RC has contributed to the loss of trust that the Humanities and Social Science have been experienced over the past two decades (Yong 2018). While a discussion about Best Practices in CL has recently begun (Berez-Kroeker et al. 2018) more attention has to be placed on the causes of the RC and the lessons that can be learnt from it.

CL is somewhat disjunct from current developments in Data Science due to a lack of communication and unawareness of existing resources. This talk aims to raise awareness in CL about existing resources and problematic practices that are still common in CL, and it proposes solutions that are easily implemented and can guarantee transparency, replicability, and high quality of research outputs in CL.

The solutions that this talk focuses on encompass

  • being aware and following the FAIR principles (Findable, Accessible, Interoperable, and Reusable) in data management;

  • the recognition of corpora as research outputs which allows corpora to be uniquely indexed (DOIs) and thereby enabling corpus compilers to profit from making corpora accessible as these can be cited like other publications which increases citation scores and visibility;

  • the use of Git to share code and data which is an easy way to share resources free of charge by utilizing existing research infrastructure;

  • the use of R Notebooks to document analyses and making them available to the community and reviewers to enable full replicability and reproducibility;

  • making use of documentation and policy protocols in departments, schools and institutes to ease onboarding procedures and prevent data loss and corruption.

The talk thus offers relevant information for authors as well as editors and publishers to enable replication, avoid “bad” research practices, and increase the quality of research.


Berez-Kroeker, A. L., L. Gawne, S. S. Kung, B. F. Kelly, T. Heston, G. Holton, P. Pulsifer, D. I. Beaver, S. Chelliah, S. Dubinsky, et al. (2018). Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics 56(1), 1–18.

Diener, Edward and Biswas-Diener, Robert (2019). The Replication Crisis in Psychology. NOBA Project.

Yong, Ed (2018). Psychology’s Replication Crisis Is Running Out of Excuses. Another big project has found that only half of studies can be repeated. And this time, the usual explanations fall flat. The Atlantic.

Implementing school-based support infrastructure for digital humanities research at UQ - The Language Technology and Data Analysis Laboratory (LADAL)

Date: 30 October 2019

Speaker: Michael Haugh & Martin Schweinberger

Presentation at the Australian Research Data Commons (ARDC): The Australian eResearch Skilled Workforce Summit. Sydney, Australia, 29-30/7/2019.

Materials: slides

Abstract: This presentation introduces the Language Technology and Data Analysis Laboratory (LADAL), and discusses the implications of our experiences to date in establishing it for broader efforts to develop researcher capacity in the digital humanities.

The LADAL is school-based support infrastructure for digital humanities researchers. It aims to assist staff and postgraduate students within the UQ School of Languages and Cultures to learn how to use data analytics, digital research tools, and other forms of technology to enhance their existing research programs, as well as offer pathways to new research possibilities. It complements the more generic resources and training in digital humanities methods offered by libraries (e.g. the Digital Scholars Hub at UQ) with the more specialised training/support in particular digital research methods and technologies that are required by researchers working on specific languages and cultures.

The LADAL consists of a specialist computing lab for language-based computational and experimental work (the Computational and Experimental Workshop) and an online virtual lab. With respect to web-based materials, the LADAL website ( offers self-guided study materials and hands-on tutorials on topics relating to digital tools, computational methods for data extraction and processing, data visualization, statistical analyses of language data, and provides links to further resources and short descriptions of digital tools relevant for digital HASS research. In addition, the LADAL offers face-to-face consultations and specialized workshops. UQ researchers are encouraged to contact LADAL staff for advice and guidance on matters relating to digital research tools, data visualization, various statistical procedures, and text analytics.

Staff feedback during face-to-face consultations and workshop attendance confirms there is substantial demand for the kind of digital humanities infrastructure offered by LADAL. It also suggests that support and training for researchers in the digital humanities should be conceptualized on a continuum from more generic through to more localized support.

Using R for Corpus Linguistics – an Introduction and Discussion Note on Sustainability and Replicability in Corpus Linguistics

Date: 2 April 2019

Speaker: Martin Schweinberger

Presentation at the Center of Excellence for the Dynamics of Language (CoEDL) Corpus Workshop. Melbourne, Australia, 2–3/4/2019.

Materials: slides


Below are links to additional resources, workshops, and presentations.

Workshop materials

Back to top

Back to HOME