This R Markdown file accompanies the two-hour workshop for the Reproducible Open Coding Kit (ROCK) and Epistemic Network Analysis (ENA). The script below integrates the directory structure and functionality of the R package {rock}. For more ROCK functions and materials, see: https://rock.science. To access full {rock} functionality, see: https://rock.opens.science. Below “{rock}” is used to refer to the R package, “ROCK” refers to the standard or to both the R package and the standard simultaneously.
Resource | URL |
---|---|
ROCK Website | https://rock.science |
ENA Website | https://www.epistemicnetwork.org |
Workshop repository | https://gitlab.com/szilvia/rock_ena_workshop_2hrs |
License | CC0 1.0 Universal |
Rendered version of script | https://szilvia.gitlab.io/rock_ena_workshop_2hrs |
Posit Cloud project for workshop | https://posit.cloud/content/7960633 |
During the first part of the workshop, we will be using the {rock} to
prepare, code, and segment our qualitative data. After this, we will use
the same package to generate a few analyses and visualizations, and then
finally, the qualitative data table. The second part of the workshop
will focus on using the ENA webtool to generate network graphs; we will
explore various model parameterizations and their effects on model
interpretation. This script only covers the ROCK portion of the
workshop; to continue onto exercises with ENA, please refer to the power
point presentation in this repository, which is located within the
“ppts” subdirectory (direct link here).
To employ the script below, you need to download R and RStudio; some guidance on that is available here.
If you do not want to work locally, you can use Posit Cloud (formerly: RStudio Cloud; https://posit.cloud) by creating an account and visiting the workshop Posit Cloud project. In this case, you do not need to download anything R-related.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
The script below contains R commands (in the gray sections called “chunks”), which can be run individually by pressing the green “play button” in the chunk’s upper right corner. Note, you will only see this option if you open the script in RStudio; otherwise, this file merely contains plain text.
The workshop materials integrate Git as well (see e.g., files with the .git extension), which is a repository and version control system. The workshop is housed on Gitlab. For setting up a Gitlab repository, see here. You do not need to use Git in order for {rock} functionality to work, but it is a good practice.
Run this chunk every time you start a session!
To be sure you are up-to-date, run this chunk every time you wish to use
the script. This chunk also contains the specifications for persistent
identifiers (see here); if you have any other
persistent IDs apart from the ones listed, you can add those here.
Lastly, this chunk establishes the paths to subdirectories, which {rock}
commands will consider input and output destinations. Run it by clicking
on the green play button in the top right corner of the chunk.
To run commands when knitting this script, change “eval = FALSE” to
“eval = TRUE” in the knitr options!
### package installs and updates
packagesToCheck <- c("rock", "here", "knitr", "writexl");
for (currentPkg in packagesToCheck) {
if (!requireNamespace(currentPkg, quietly = TRUE)) {
install.packages(currentPkg, repos="http://cran.rstudio.com");
}
}
knitr::opts_chunk$set(
echo = TRUE,
eval = FALSE,
comment = ""
);
rock::opts$set(
silent = TRUE,
idRegexes = list(
cid = "\\[\\[cid[=:]([a-zA-Z][a-zA-Z0-9_]*)\\]\\]",
coderId = "\\[\\[coderid[=:]([a-zA-Z][a-zA-Z0-9_]*)\\]\\]"
),
sectionRegexes = list(
sectionBreak = "---<<([a-zA-Z][a-zA-Z0-9_]*)>>---"
),
persistentIds = c("cid", "coderId")
);
### Set paths for later
basePath <- here::here();
dataPath <- file.path(basePath, "data");
scriptsPath <- file.path(basePath, "scripts");
resultsPath <- file.path(basePath, "results");
Your data in raw form is in the “000—raw-data” subdirectory located within the data directory. The same data has been put into separate plain text files, and placed into the “010—raw-sources” subdirectory. Your data should be in plain text files from here on; these are referred to as “sources”. For more on preparing data, see here.
There is also a list of attributes of your data providers in the data
directory. These have been specified according to the ROCK standard. For
more on this format, see here.
Qualitative data is often “messy”, but you can use this command to help you clean them in several respects. To view these aspects, type “?rock::clean_sources” into the console and hit enter. Most importantly, the cleaning command places each of the sentences in your data on a new line. Since coding with iROCK (see below) is performed per line of data, this act of segmentation is a necessary step when working with the ROCK. But your data does not need to be segmented based on sentences. The smallest codable pieces can be anything from a paragraph to an entire transcript. The {rock} recognizes newline characters as indicators of segmentation, so if you do not want to code sentence-by-sentence, you can place your chosen segments on new lines manually or change the default options in the R package. Note, this act of segmentation refers to the lowest level (on which you will code); you can also add higher levels of segmentation (e.g., delimiting topics), by adding section breaks. For more on section breaks, see here. The chunk below will write the cleaned sources found in “010—raw-sources” into the subdirectory “020—cleaned-sources”.
rock::clean_sources(
input = file.path(dataPath, "010---raw-sources"),
output = file.path(dataPath, "020---cleaned-sources")
);
If it makes sense for your project, you may choose to add a unique identifier to each line of data (lines from here on are referred to as “utterances”). With this unique utterance identifier (uid), you will be able to locate or refer to any specific utterance in your dataset. Furthermore, if multiple coders are employing different codes (or coding schemes) to code the data, you may want to merge those different versions of the coded sources into a source that contains all codes applied by the various researchers; this merging takes place based on uids. To view more on uids, type “?rock::prepend_ids_to_sources” into the console and hit enter. The chunk below will write the sources with uids into the subdirectory “030—sources-with-uids”.
rock::prepend_ids_to_sources(
input = file.path(dataPath, "020---cleaned-sources"),
output = file.path(dataPath, "030---sources-with-uids")
);
If you’d like to code your data manually, you can use a rudimentary graphical user interface called iROCK (available at: https://i.rock.science). This interface allows you to upload your codes, section breaks (for higher levels of segmentation), and your sources, and then drag and drop codes/section breaks into the data. If you are using the original iROCK, please remember to download your work once you have finished coding. More information on iROCK is available here.
Note, your codes and section breaks need to be in a specific format if you wish to upload them as a list. If you want to generate them via the interface inductively, then iROCK places those into the specified format automatically. Once you have coded and downloaded your sources, in order for the commands below to work, you need to modify the name of the sources to include a “slug”. You may have noticed that the above commands (cleaning and adding uids) have also placed a slug on the name of your sources. After coding, rename sources to include the slug: “_coded”; for example: “Source_1_cleaned_withUIDs_coded”. (You may also use a different slug, but then remember to change the code in the script as well!)
If you’d like to employ automated coding and/or you want to use
{rock} recode functions, please refer to: https://rock.opens.science and https://rock.science.
Run this chunk every session during which you want to employ
the functionality below (e.g., inspecting fragments, code frequencies,
heatmaps)!
This command will assemble all your coded
sources (and persistent IDs and attributes, if you have specified
these), into an R object that can be employed to run analyses and other
commands below. The chunk below will use the coded sources (and if
applicable, the YAML fragments) to parse your sources. The regex
(regular expression) specified for retrieving those sources is set to
“_coded”, the slug specified in the previous step (manual coding);
please modify this regex in the chunk below if your slug is
different.
dat <-
rock::parse_sources(
dataPath,
regex = "_coded|attributes"
);
The command below allows you to collect and inspect all coded fragments within your dataset, by code. The context is set to “2”, which means that you will see two lines of data prior to the coded line, and two lines subsequent to it. Feel free to change this number if you wish to see more/less lines of context. If you wish to only inspect a certain code or codes, use the command under the heading “Inspect coded fragments for specific code(s)”.
rock::collect_coded_fragments(
dat,
context = 2
);
This command allows you to collect and inspect coded fragments for only certain codes. If you’d only like to see fragments for a single code, just delete the pipe and the second code (e.g., “|Emotional_support”). If you’d like to add codes to the list, use the pipe and add the code label(s), e.g.: “CodeA|CodeB|CodeC”. Again, you can modify the amount of context you wish to have around the coded utterance by changing “2” to any other number. The chunk below will show you coded utterances for all specified codes.
rock::inspect_coded_sources(
path = here::here("data", "040---coded-sources"),
fragments_args = list(
codes = "Sign_other|Emotional_support",
context = 2
)
);
This command prints all attributes listed in the case-attributes.rock file into tabular format.
rock::show_attribute_table(dat)
Based on your codes, {rock} can create a code tree, provided your codes are specified according to the ROCK standard; this can be a flat or hierarchical structure. For more on these specifications, please see here.
rock::show_fullyMergedCodeTrees(dat)
This command will allow you to see a bar chart of the code frequencies within the various sources they were applied. The command also produces a legend at the bottom of the visual to help identify the sources based on color.
rock::code_freq_hist(
dat
);
Code co-occurrences can be visualized with a heatmap. This representation will use colors to indicate the co-occurrence frequencies. Co-occurrences are defined as two or more codes occurring on the same line of data (utterance). The console will also show you the co-occurrence matrix from which the visualization was generated.
rock::create_cooccurrence_matrix(
dat,
plotHeatmap = TRUE);
This command will enable a tabularized version of your dataset, which can be employed to further process your data with ENA or “merely” represent your coded data in a single file. In this dataset, rows are constituted by utterances, columns by attributes, codes, and data. The file will be an Excel called “mergedSourceDf” located in the results subdirectory.
Provided you have added attributes, those will also be included. To make sure you are complying with the ROCK standard (which is necessary for the command to work), please see here.
Beware, when re-generating the qualitative data table, the {rock} default is to prevent overwriting, so either allow overwrite within the script, or delete/rename the old excel before you run this chunk.
rock::export_mergedSourceDf_to_xlsx(
dat,
file.path(resultsPath,
"mergedSourceDf.xlsx")
);
For more on ROCK terminology, see here.
2023 Zörgő S, Peters GJY. Using the Reproducible Open Coding Kit
& Epistemic Network Analysis to Model Qualitative Data. Health
Psychology and Behavioral Medicine 11:1.
2023 Zörgő S. Segmentation and Code Co-occurrence
Accumulation: Operationalizing Relational Context with Stanza
Windows. In: Advances in Quantitative Ethnography. Communications in
Computer and Information Science, Vol 1785. Eds. Damsa C. and Barany A.,
pp 146-162. Cham, Switzerland: Springer Nature.
2023 Zörgő S, Bohinsky J. Parsing the Continuum: Manual Segmentation of Monologic
Data. In: Advances in Quantitative Ethnography. Communications in
Computer and Information Science, Vol 1785. Eds. Damsa C. and Barany A.,
pp 163-181. Cham, Switzerland: Springer Nature.
2023 Árva D, Jeney A, Dunai D, Major D, Cseh A, Zörgő S. Approaches to Code Selection for Epistemic Networks.
In: Advances in Quantitative Ethnography. Communications in Computer and
Information Science, Vol 1895. Eds. Arastoopour Irgens G and Knight S.
pp 409-425. Cham, Switzerland: Springer Nature.
2024 Zörgő S, Árva D, Eagan B. Making Sense of the Model: Interpreting Epistemic
Networks and their Projection Space. Submitted to the 6th
International Conference on Quantitative Ethnography (ICQE24). Preprint.
The following workshops can be taken autonomously, based on the provided instructions and materials:
One-hour SQAFFOLD workshop: https://sqaffold.gitlab.io/1-hour-workshop
Two-hour
ROCK workshop: https://sci-ops.gitlab.io/rock-workshop-2-hour
QE
Hub (open educational resources for unified methods)
QE
Sandbox (short video lectures on unified methodology)
The Reproducible Open Coding Kit (ROCK) standard is licensed under CC0 1.0 Universal. The {rock} R package is licensed under a GNU General Public License; for more see: https://rock.science.
ROCK citation:
Gjalt-Jorn Ygram Peters and Szilvia Zörgő (2023).
rock: Reproducible Open Coding Kit. R package version 0.7.1. https://rock.opens.science
For more on ROCK materials licensing and citation, please see here.
ENA citation:
See here for citation information on
the R package {rENA}.
Thank you for considering to use ROCK for your qualitative or unified
project. If you have any questions or would like to make suggestions on
how to improve these resources, feel free to write to: info@rock.science.