This R Markdown file accompanies the two-hour workshop for the Reproducible Open Coding Kit (ROCK) and Epistemic Network Analysis (ENA). The script below integrates the directory structure and functionality of the R package {rock}. For more ROCK functions and materials, see: https://rock.science. To access full {rock} functionality, see: https://rock.opens.science. Below “{rock}” is used to refer to the R package, “ROCK” refers to the standard or to both the R package and the standard simultaneously.

Resource	URL
ROCK Website	https://rock.science
ENA Website	https://www.epistemicnetwork.org
Workshop repository	https://gitlab.com/szilvia/rock_ena_workshop_2hrs
License	CC0 1.0 Universal
Rendered version of script	https://szilvia.gitlab.io/rock_ena_workshop_2hrs
Posit Cloud project for workshop	https://posit.cloud/content/7960633

Introduction

During the first part of the workshop, we will be using the {rock} to prepare, code, and segment our qualitative data. After this, we will use the same package to generate a few analyses and visualizations, and then finally, the qualitative data table. The second part of the workshop will focus on using the ENA webtool to generate network graphs; we will explore various model parameterizations and their effects on model interpretation. This script only covers the ROCK portion of the workshop; to continue onto exercises with ENA, please refer to the power point presentation in this repository, which is located within the “ppts” subdirectory (direct link here).

Getting started

R setup (for future work)

To employ the script below, you need to download R and RStudio; some guidance on that is available here.

If you do not want to work locally, you can use Posit Cloud (formerly: RStudio Cloud; https://posit.cloud) by creating an account and visiting the workshop Posit Cloud project. In this case, you do not need to download anything R-related.

Understanding R Markdown

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

The script below contains R commands (in the gray sections called “chunks”), which can be run individually by pressing the green “play button” in the chunk’s upper right corner. Note, you will only see this option if you open the script in RStudio; otherwise, this file merely contains plain text.

Git integration

The workshop materials integrate Git as well (see e.g., files with the .git extension), which is a repository and version control system. The workshop is housed on Gitlab. For setting up a Gitlab repository, see here. You do not need to use Git in order for {rock} functionality to work, but it is a good practice.

Data preparation with ROCK

Basic setup

Run this chunk every time you start a session!
To be sure you are up-to-date, run this chunk every time you wish to use the script. This chunk also contains the specifications for persistent identifiers (see here); if you have any other persistent IDs apart from the ones listed, you can add those here. Lastly, this chunk establishes the paths to subdirectories, which {rock} commands will consider input and output destinations. Run it by clicking on the green play button in the top right corner of the chunk.
To run commands when knitting this script, change “eval = FALSE” to “eval = TRUE” in the knitr options!

### package installs and updates
packagesToCheck <- c("rock", "here", "knitr", "writexl");
for (currentPkg in packagesToCheck) {
  if (!requireNamespace(currentPkg, quietly = TRUE)) {
    install.packages(currentPkg, repos="http://cran.rstudio.com");
  }
}

knitr::opts_chunk$set(
  echo = TRUE,
  eval = FALSE,
  comment = ""
);

rock::opts$set(
  silent = TRUE,
  idRegexes = list(
    cid = "\\[\\[cid[=:]([a-zA-Z][a-zA-Z0-9_]*)\\]\\]",
    coderId = "\\[\\[coderid[=:]([a-zA-Z][a-zA-Z0-9_]*)\\]\\]"
  ),
  sectionRegexes = list(
    sectionBreak = "---<<([a-zA-Z][a-zA-Z0-9_]*)>>---"
  ),
  persistentIds = c("cid", "coderId")
);

### Set paths for later
basePath <- here::here();
dataPath <- file.path(basePath, "data");
scriptsPath <- file.path(basePath, "scripts");
resultsPath <- file.path(basePath, "results");

Sources and attributes

Your data in raw form is in the “000—raw-data” subdirectory located within the data directory. The same data has been put into separate plain text files, and placed into the “010—raw-sources” subdirectory. Your data should be in plain text files from here on; these are referred to as “sources”. For more on preparing data, see here.

There is also a list of attributes of your data providers in the data directory. These have been specified according to the ROCK standard. For more on this format, see here.

Cleaning data

Qualitative data is often “messy”, but you can use this command to help you clean them in several respects. To view these aspects, type “?rock::clean_sources” into the console and hit enter. Most importantly, the cleaning command places each of the sentences in your data on a new line. Since coding with iROCK (see below) is performed per line of data, this act of segmentation is a necessary step when working with the ROCK. But your data does not need to be segmented based on sentences. The smallest codable pieces can be anything from a paragraph to an entire transcript. The {rock} recognizes newline characters as indicators of segmentation, so if you do not want to code sentence-by-sentence, you can place your chosen segments on new lines manually or change the default options in the R package. Note, this act of segmentation refers to the lowest level (on which you will code); you can also add higher levels of segmentation (e.g., delimiting topics), by adding section breaks. For more on section breaks, see here. The chunk below will write the cleaned sources found in “010—raw-sources” into the subdirectory “020—cleaned-sources”.

rock::clean_sources(
  input = file.path(dataPath, "010---raw-sources"),
  output = file.path(dataPath, "020---cleaned-sources")
);

Adding unique utterance identifiers

If it makes sense for your project, you may choose to add a unique identifier to each line of data (lines from here on are referred to as “utterances”). With this unique utterance identifier (uid), you will be able to locate or refer to any specific utterance in your dataset. Furthermore, if multiple coders are employing different codes (or coding schemes) to code the data, you may want to merge those different versions of the coded sources into a source that contains all codes applied by the various researchers; this merging takes place based on uids. To view more on uids, type “?rock::prepend_ids_to_sources” into the console and hit enter. The chunk below will write the sources with uids into the subdirectory “030—sources-with-uids”.

rock::prepend_ids_to_sources(
  input = file.path(dataPath, "020---cleaned-sources"),
  output = file.path(dataPath, "030---sources-with-uids")
);

Manual coding

If you’d like to code your data manually, you can use a rudimentary graphical user interface called iROCK (available at: https://i.rock.science). This interface allows you to upload your codes, section breaks (for higher levels of segmentation), and your sources, and then drag and drop codes/section breaks into the data. If you are using the original iROCK, please remember to download your work once you have finished coding. More information on iROCK is available here.

Note, your codes and section breaks need to be in a specific format if you wish to upload them as a list. If you want to generate them via the interface inductively, then iROCK places those into the specified format automatically. Once you have coded and downloaded your sources, in order for the commands below to work, you need to modify the name of the sources to include a “slug”. You may have noticed that the above commands (cleaning and adding uids) have also placed a slug on the name of your sources. After coding, rename sources to include the slug: “_coded”; for example: “Source_1_cleaned_withUIDs_coded”. (You may also use a different slug, but then remember to change the code in the script as well!)

If you’d like to employ automated coding and/or you want to use {rock} recode functions, please refer to: https://rock.opens.science and https://rock.science.

Parse sources

Run this chunk every session during which you want to employ the functionality below (e.g., inspecting fragments, code frequencies, heatmaps)!
This command will assemble all your coded sources (and persistent IDs and attributes, if you have specified these), into an R object that can be employed to run analyses and other commands below. The chunk below will use the coded sources (and if applicable, the YAML fragments) to parse your sources. The regex (regular expression) specified for retrieving those sources is set to “_coded”, the slug specified in the previous step (manual coding); please modify this regex in the chunk below if your slug is different.

dat <-
  rock::parse_sources(
    dataPath,
    regex = "_coded|attributes"
  );

Analyses and visualizations

Inspect coded fragments

The command below allows you to collect and inspect all coded fragments within your dataset, by code. The context is set to “2”, which means that you will see two lines of data prior to the coded line, and two lines subsequent to it. Feel free to change this number if you wish to see more/less lines of context. If you wish to only inspect a certain code or codes, use the command under the heading “Inspect coded fragments for specific code(s)”.

rock::collect_coded_fragments(
  dat,
  context = 2
);

Inspect coded fragments for specific code(s)

This command allows you to collect and inspect coded fragments for only certain codes. If you’d only like to see fragments for a single code, just delete the pipe and the second code (e.g., “|Emotional_support”). If you’d like to add codes to the list, use the pipe and add the code label(s), e.g.: “CodeA|CodeB|CodeC”. Again, you can modify the amount of context you wish to have around the coded utterance by changing “2” to any other number. The chunk below will show you coded utterances for all specified codes.

rock::inspect_coded_sources(
  path = here::here("data", "040---coded-sources"),
  fragments_args = list(
    codes = "Sign_other|Emotional_support",
    context = 2
  )
);

Attribute table

This command prints all attributes listed in the case-attributes.rock file into tabular format.

rock::show_attribute_table(dat)

Coding scheme

Based on your codes, {rock} can create a code tree, provided your codes are specified according to the ROCK standard; this can be a flat or hierarchical structure. For more on these specifications, please see here.

rock::show_fullyMergedCodeTrees(dat)

Code frequencies

This command will allow you to see a bar chart of the code frequencies within the various sources they were applied. The command also produces a legend at the bottom of the visual to help identify the sources based on color.

rock::code_freq_hist(
  dat
);

Code co-occurrences: Heatmap

Code co-occurrences can be visualized with a heatmap. This representation will use colors to indicate the co-occurrence frequencies. Co-occurrences are defined as two or more codes occurring on the same line of data (utterance). The console will also show you the co-occurrence matrix from which the visualization was generated.

rock::create_cooccurrence_matrix(
    dat,
    plotHeatmap = TRUE);

Export Qualitative Data Table: Excel

This command will enable a tabularized version of your dataset, which can be employed to further process your data with ENA or “merely” represent your coded data in a single file. In this dataset, rows are constituted by utterances, columns by attributes, codes, and data. The file will be an Excel called “mergedSourceDf” located in the results subdirectory.

Provided you have added attributes, those will also be included. To make sure you are complying with the ROCK standard (which is necessary for the command to work), please see here.

Beware, when re-generating the qualitative data table, the {rock} default is to prevent overwriting, so either allow overwrite within the script, or delete/rename the old excel before you run this chunk.

rock::export_mergedSourceDf_to_xlsx(
  dat,
  file.path(resultsPath,
            "mergedSourceDf.xlsx")
);

Terms used in ROCK and ENA

Attribute (ROCK): Characteristic of data, data provider, or data provision
Case (ROCK): A data provider
Code (ROCK/ENA): Representation of a construct of interest in a qualitative or unified study
Code ID (ROCK): Machine-readable name of code
Code label (ROCK): Human-readable name of code
Coding scheme (ROCK/ENA): Group of codes to be applied to qualitative data
Coding structure (ROCK/ENA): Type of coding scheme, e.g., flat or hierarchical
Conversation (ENA): A set of one or more stanzas (e.g., a question-response segment in an interview); responsible for code co-occurrence aggregation
Markdown: Formatting syntax for authoring HTML, PDF, and MS Word documents
Section break (ROCK): Indicator of the end of a meaningful chunk of data (higher-level segmentation)
Segmentation (ROCK/ENA): Dividing the data into meaningful chunks (for further analysis)
Source (ROCK): File with content to code or coded content
Stanza (ENA): A set of one or more utterances (e.g., a topic within a response to an interview question); responsible for code co-occurrence accumulation
Stanza window (ENA): A specific stanza designation (e.g., moving stanza, infinite stanza)
Unified methods: Group of methods leveraging quantified aspects of qualitative data
Unit (ENA): The totality of utterances associated with a network within a model; a model may consist of one or more networks in the same projection space (e.g., all data from an interviewee)
Utterance (ROCK/ENA): Smallest meaningful fragment of data (segmentation level where coding is performed)
Utterance identifier (ROCK): Unique label identifying a single line in the dataset

For more on ROCK terminology, see here.

ROCK and ENA resources

Publications

2023 Zörgő S, Peters GJY. Using the Reproducible Open Coding Kit & Epistemic Network Analysis to Model Qualitative Data. Health Psychology and Behavioral Medicine 11:1.

2023 Zörgő S. Segmentation and Code Co-occurrence Accumulation: Operationalizing Relational Context with Stanza Windows. In: Advances in Quantitative Ethnography. Communications in Computer and Information Science, Vol 1785. Eds. Damsa C. and Barany A., pp 146-162. Cham, Switzerland: Springer Nature.

2023 Zörgő S, Bohinsky J. Parsing the Continuum: Manual Segmentation of Monologic Data. In: Advances in Quantitative Ethnography. Communications in Computer and Information Science, Vol 1785. Eds. Damsa C. and Barany A., pp 163-181. Cham, Switzerland: Springer Nature.

2023 Árva D, Jeney A, Dunai D, Major D, Cseh A, Zörgő S. Approaches to Code Selection for Epistemic Networks. In: Advances in Quantitative Ethnography. Communications in Computer and Information Science, Vol 1895. Eds. Arastoopour Irgens G and Knight S. pp 409-425. Cham, Switzerland: Springer Nature.

2024 Zörgő S, Árva D, Eagan B. Making Sense of the Model: Interpreting Epistemic Networks and their Projection Space. Submitted to the 6th International Conference on Quantitative Ethnography (ICQE24). Preprint.

Workshops

The following workshops can be taken autonomously, based on the provided instructions and materials:

One-hour SQAFFOLD workshop: https://sqaffold.gitlab.io/1-hour-workshop
Two-hour ROCK workshop: https://sci-ops.gitlab.io/rock-workshop-2-hour

Other

QE Hub (open educational resources for unified methods)

QE Sandbox (short video lectures on unified methodology)

Citation and licensing

The Reproducible Open Coding Kit (ROCK) standard is licensed under CC0 1.0 Universal. The {rock} R package is licensed under a GNU General Public License; for more see: https://rock.science.

ROCK citation:
Gjalt-Jorn Ygram Peters and Szilvia Zörgő (2023). rock: Reproducible Open Coding Kit. R package version 0.7.1. https://rock.opens.science

For more on ROCK materials licensing and citation, please see here.

ENA citation:
See here for citation information on the R package {rENA}.

Feedback

Thank you for considering to use ROCK for your qualitative or unified project. If you have any questions or would like to make suggestions on how to improve these resources, feel free to write to: info@rock.science.

ROCK and ENA Workshop 2hrs

Szilvia Zörgő

2024-04-30 at 13:54:57 CEST (UTC+0200)