Introduction

This tutorial exemplifies how to create a vowel chart with Praat and R. The entire R markdown document for this tutoral can be downloaded here.

When learning or studying a language - the case in point here being English - it is likely that you are confronted with different classes of sounds, e.g. consonants and vowels (Rogers 2014). Consonants differ from vowels in that they are formed with an obstruction of the air stream coming from the lungs and they cannot form the nucleus of a syllable (Zsiga 2012). In fact, consonants are classified according to the manner and place of the obstruction of the air stream. As vowels are produced without obstruction of the air stream, other criteria for differentiating between vowel sounds are needed. The criteria for differentiating between different vowel sounds are

  • the number of tongue positions during vowel production (to differentiate between mono-, diph-, and triphthongs),

  • height of the tongue,

  • position of the tongue,

  • roundedness of the lips.

The latter two features are used in the production of vowel charts which show where in the mouth the tongue is located during the production of monophthongal vowel phones. A vowel chart for the monophthongal vowel phones in Received Pronunciation (RP) is shown below.

Figure 1: Vowel Chart of monophthongal vowel sounds in Received Pronunciation (RP) (left); Tongue positions for the vowels /i, e, E, a/ (right)Figure 1: Vowel Chart of monophthongal vowel sounds in Received Pronunciation (RP) (left); Tongue positions for the vowels /i, e, E, a/ (right)

Figure 1: Vowel Chart of monophthongal vowel sounds in Received Pronunciation (RP) (left); Tongue positions for the vowels /i, e, E, a/ (right)

Interestingly, a very similar figure can be created by plotting the Hertz frequency of the first formant of monophthongal vowel sound against the Hertz frequency of second formant minus the Hertz frequency of the first formant of a monophthongal vowel sound. Formants are frequencies of air waves that, if collapsed, form a complex vowel sound (Johnson 2011; Ladefoged 1996). In other words, vowels are periodic, i.e. rythmic, compressions and decompressions of air and to create a vowel sound, i.e. a complex periodic wave, one needs to produce several simple periodic waves simultaneously. During acoustic analysis, the complex wave is deconstructed into its component parts, i.e. the simple periodic waves that make up that sound. This means that we do not necessarily have to plot the position of the tongue of a speaker when he or she produces vowels to create a vowel chart but that analyses of audio recordings of words in which vowels occur, can be utilized to plot a personalized vowel chart of a speaker. Such vowel charts can then be used in language learning as corrective feedback (see Paganus et al. 2006).

To produce a personalized vowel chart, the following steps are necessary:

  1. Install Praat

  2. Record words in which all monophthongal vowel sounds of a given variety occur;

  3. Measure and extract the first and second formant of each vowel;

  4. Visualize the vowel sounds.

The subsequent sections elaborate the above steps. However, before continuing a word of warning is in order. The example focuses on extracting and plotting vowel formants in an easy but also very uncontrolled way. In case vowel formant extraction is part of a proper research project, some additional steps are warranted. For instance, in a serious research project, it were necessary to control and reduce environmental noise and to optimize the recording situation, one would have to randomize the test items (words with the required phonetic environment and the respective vowel sounds) and use filler items (words that are not relevant for the analysis proper) in order to avoid participants guessing which items are relevant for the analysis, one would also use text grids in Praat to guarantee replicability instead of the simple measurements we use in the example here, etc. However, in case you are only interested in an approximation of your own vowel production and how native-like it is, the example fulfills its purpose and provides the reader with a step-by-step guide on how to plot your personalized vowel chart.

Downloading and installing PRAAT The first step is thus to download Praat form www.praat.org and to install it on your machine by following the instructions provided on the website and by the Praat installation script. Praat is an open{source software for acoustic analysis that was developed by Paul Boersma at the University of Amsterdam.

After having installed Praat we need to record the words in which the monophthongal vowel phones occur. In this example, we will simply record the words shown below.

The following section describes how to record data in Praat (see ??? for a more elaborate description of how this can be done).

1 Recording words in PRAAT

To record these words, start Praat with a double click on the Praat symbol which - after intstallation - appears on your Desktop. Two windows will appear: the main object window to the left and the picture window to the right (cf. Figure 2). Close the picture window on the right and choose New from the menu at the top of the main object window and select Record mono sound from the menu which pops up. For the recording it is, of course, necessary that a microphone is hook up to your machine { the better the microphone, the better the recording and thus the more accurate the graphical display we are going to produce.

Figure 2: Praat's main object window (left) and Praat's picture window (right)

Figure 2: Praat’s main object window (left) and Praat’s picture window (right)

Selecting Record mono sound opens Praat’s SoundRecorder window (cf. Figure 4). Select Record, label the recording by entering a title, e.g. vowels, in the Name field and read the words form the list shown in Table .

Figure 3: Praat's main object window

Figure 3: Praat’s main object window

Figure 4: Praat's recording window

Figure 4: Praat’s recording window

Each word should be repeated at least three times with a short break between the individual items so that what you record is had, had, had … pause … hard, hard, hard, etc. Try to sound natural, i.e. avoid speaking too fast or too slow, and try not to sound artificial or too careful.

While recording, there should be some green bouncing up and down in the vertical white " stripe (no bouncing indicates that your machine is not recording properly from the microphone).Once you are finished with your recording, select Stop and next select Save to list & close (cf. Figure 8).

Figure 5: Praat's recording window during recording

Figure 5: Praat’s recording window during recording

Figure 6: Praat's recording window after recording

Figure 6: Praat’s recording window after recording

Saving has created an object in Praat’s main object window - in case you have named your recording vowels, the new object will be called 1. Sound vowels (cf. Figure 7). Before editing the data, it is advisable to save them on your machine. To save the data select the Save option from the upper menu, then select Save as WAV file... and navigate to the directory in which you want to save the recorded data.

Figure 7: Praat's main object window with saved object

Figure 7: Praat’s main object window with saved object

Figure 8: Save the recording as a .wav file

Figure 8: Save the recording as a .wav file

Next, select View & Edit in Praat’s main menu in the main object window. This will open Praat’s edit window (cf. Figure 9) - the object represents a recording of the word heed repeated three times for sake of simplicity.

Figure 9: Praat's edit window with the word *heed* repeated three times

Figure 9: Praat’s edit window with the word heed repeated three times

After recording and saving the data necessary for the task at hand, we continue by extracting the vowel formants.

2 Measure and extract vowel formants

Before extracting of the vowel formants, some parameters need adjusting. In a first step, go to Formant from the menu at the top of the edit window and select Formant settings.... Next, select the option Show formant and then, depending on whether the recording represents a male, a female or a child, adjust the Maximum formant (Hz) to 5000 Hz (male), 5500 Hz (female) or up to 8000 Hz (for a child) (cf. http://www.haskins.yale.edu/staff/gafos_ downloads/AcouToyPraat(1).pdf). It may also be necessary to adjust the number of formants that Praat aims to find: the default is 5, but it may be set to any number between 3 and 7 depending on the data. To elaborate, if the formants do not exhibit a regular horizontal pattern but they are somewhat unsteady or the dots are all over the place, try to find the number of formants that provide the best results (i.e. steady horizontal lines).

Figure 10: Praat's edit window with the word *heed* repeated three times and formants shown

Figure 10: Praat’s edit window with the word heed repeated three times and formants shown

After having set the parameters, listen to the recording and highlight the section which represents the vowel sound you want to extract the formants from. Highlightling is done by selecting the start and end point of the vowel sound - the beginning and end of the steady line during which the vowel is produced - within the edit window as done for the first of the three instances of heed in Figure 11.

Figure 11: Praat's edit window with the word *heed* repeated three times and formants shown and steady state selected

Figure 11: Praat’s edit window with the word heed repeated three times and formants shown and steady state selected

The vowel formants can be extracted by going to Formant in the edit window and selecting Get first formant. Having done so, a window with the mean Hertz frequency of the first formant during the steady state is shown (cf. Figure 12). Please note that you should additionally extract the start and end time of the highlighted section from the display in the edit window.

Figure 12: The mean Hertz frequency of first formant of the word *heed* during the steady state

Figure 12: The mean Hertz frequency of first formant of the word heed during the steady state

To extract the second (and in case you want to use your data in other analysis also the third formant) simply choose Get third formant (and Get second formant), note down the Hertz frequencies in a table, and also note down the start and end time of the steady state. The final table should look like Table below (some columns are removed for sake of simplicity).

The next section describes how to plot the data and compare the vowels to equivalent vowels produced by native-RP speakers.

3 Visualizing the vowel sounds

We will now process the data so that we can plot the F1 against the F2 values by speaker and word. To do this, we load the required packages and the data from the learner (nns) and the native speakers (ns).

# load packages
library(dplyr)
library(vowels)
library(ggplot2)
# load data
ns <- read.table("https://slcladal.github.io/data/rpvowels.txt", header = T, sep = "\t")
nns <- read.table("https://slcladal.github.io/data/vowels.txt", header = T, sep = "\t") %>%
  select(-file)

The data of the native speakers, i.e the reference data, is shown below.

The reference data is taken from from Hawkins and Midgley (2005) (see here) and represents the first and second formant for the words heed, hid, head, had, hard, hod, hoard, hood, who’d, hud, and herd produced by 5 20 to 25 year old L1-speakers of Received Pronunciation.

We now combine the two data sets, rename the subject and item columns to Speaker and Word, add a column which holds the ipa symbols of the vowel sounds that the word represent, and we calculate the means of the F1 (F1_mean) and F2 (F2_mean) by Word and Speaker.

We can now generate the vowel chart by plotting the F1 values against the F2 values. In addition, we will differentiate between different vowel sounds as well as between the learner (Learner) and native speakers (NS).

ns <- voweldata %>% filter(Speaker == "NS")
nns <- voweldata %>% filter(Speaker == "Learner")
ggplot(voweldata, aes(F2, F1, color = Speaker, group = Word, fill = Speaker)) +
  geom_point(alpha = .1) +
  geom_text(data = voweldata, aes(x = F2_mean, y = F1_mean, label = ipa), fontface = "bold")  +
  stat_ellipse(data = ns, level = 0.50, geom = "polygon", alpha = 0.05, aes(fill = Speaker)) +
  stat_ellipse(data = nns, level = 0.95, geom = "polygon", alpha = 0.05, aes(fill = Speaker)) +
  scale_x_reverse(breaks = seq(500, 3000, 500), labels = seq(500, 3000, 500)) + scale_y_reverse() +
  scale_color_manual(breaks = c("Learner", "NS"), values = c("orange", "gray40")) +
  theme_bw() +
  theme(legend.position = "top",
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())

The vowel chart shows that the i-sounds by the L1-German speaker are more fronted and that the o-sounds are substantially higher by the non-native speaker compared to the RP reference vowel spaces. The short u-sound, however, is very similar, indicating that this L1-German speaker produces the short u-sound in English very native-like while the long u-sound is higher and more fronted in the speech of the L1-German speaker. Interestingly, the vowel space of the ash differs quite dramatically between the native speakers and the L1 German speaker which could be caused by the fact that German does not have an ash vowel. I hope this short tutorial helps you in creating your own personalized vowel charts with Praat and R.

Citation & Session Info

Schweinberger, Martin. 2020. Creating Vowel Charts in R. Brisbane: The University of Queensland. url: https://slcladal.github.io/vc.html (Version 2020.09.29).

@manual{schweinberger2020vc,
  author = {Schweinberger, Martin},
  title = {Creating Vowel Charts in R},
  note = {https://slcladal.github.io/vc.html},
  year = {2020},
  organization = "The University of Queensland, Australia. School of Languages and Cultures},
  address = {Brisbane},
  edition = {2020/09/29}
}
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.2 vowels_1.2-2  dplyr_1.0.2   DT_0.15      
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.30        magrittr_1.5      MASS_7.3-51.6     munsell_0.5.0    
##  [5] tidyselect_1.1.0  colorspace_1.4-1  R6_2.4.1          rlang_0.4.7      
##  [9] stringr_1.4.0     highr_0.8         tools_4.0.2       grid_4.0.2       
## [13] gtable_0.3.0      xfun_0.16         withr_2.3.0       htmltools_0.5.0  
## [17] crosstalk_1.1.0.1 ellipsis_0.3.1    yaml_2.2.1        digest_0.6.25    
## [21] tibble_3.0.3      lifecycle_0.2.0   crayon_1.3.4      farver_2.0.3     
## [25] purrr_0.3.4       htmlwidgets_1.5.1 vctrs_0.3.4       glue_1.4.2       
## [29] evaluate_0.14     rmarkdown_2.3     labeling_0.3      stringi_1.5.3    
## [33] compiler_4.0.2    pillar_1.4.6      scales_1.1.1      generics_0.0.2   
## [37] jsonlite_1.7.1    pkgconfig_2.0.3

Main page


References

Hawkins, Sarah, and Jonathan Midgley. 2005. “Formant Frequencies of Rp Monophthongs in Four Age Groups of Speakers.” Journal of the International Phonetic Association 35 (2): 183–99.

Johnson, Keith. 2011. Acoustic and Auditory Phonetics. John Wiley & Sons.

Ladefoged, Peter. 1996. Elements of Acoustic Phonetics. Chigago: University of Chicago Press.

Paganus, Annu, Vesa-Petteri Mikkonen, Tomi Mäntylä, Sami Nuuttila, Jouni Isoaho, Olli Aaltonen, and Tapio Salakoski. 2006. “The Vowel Game: Continuous Real-Time Visualization for Pronunciation Learning with Vowel Charts.” In International Conference on Natural Language Processing (in Finland), 696–703. Springer.

Rogers, Henry. 2014. The Sounds of Language: An Introduction to Phonetics. Routledge.

Zsiga, Elizabeth C. 2012. The Sounds of Language: An Introduction to Phonetics and Phonology. John Wiley & Sons.