steven owens
A lovely podcast by steven owens
Objective vs Subjective
0:00
-35:26

Objective vs Subjective

Study Guide

This guide is designed to review and synthesize concepts from the provided sources, covering topics in communication, cognitive science, healthcare, and law. It includes a short-answer quiz with an answer key, a set of essay questions for deeper analysis, and a comprehensive glossary of key terms.


--------------------------------------------------------------------------------

Quiz: Short-Answer Questions

Instructions: Answer the following questions in two to three sentences, drawing only from the information provided in the source context.

  1. What are "subjective symptom expressions" (SSEs) as defined in the "Sitting on Pins and Needles" study, and how do they differ from "symptom terms" (STs)?

  2. According to Julia Galef, what is the core difference between the "soldier mindset" and the "scout mindset"?

  3. Define the "reasonable person standard" and explain its purpose in personal injury law.

  4. The "Sitting on Pins and Needles" study found that a significant percentage of SSEs were not captured by standard medical terminology or coding. What was this percentage, and what does this finding suggest about current data extraction methods in healthcare?

  5. What is the "ideological Turing test," and what is its goal as both a cognitive and emotional exercise?

  6. List three key techniques for "active listening" as outlined by the CCOHS.

  7. How does the Linguanaut article differentiate between subjective and objective language? Provide an example of each.

  8. According to the Unequal Treatment text, what are cross-cultural education programs designed to do for health professionals?

  9. The "Acktually" meme is used to mock a specific type of online behavior. Describe this behavior and the reason for the intentional misspelling of "actually."

  10. What is the concept of "rational irrationality" as described in the Julia Galef interview?


--------------------------------------------------------------------------------

Answer Key

  1. "Subjective symptom expressions" (SSEs) are defined as phrases that entirely or partially capture the patient's voice when describing symptoms, including figures of speech, idioms, or lay terms (e.g., "sitting on pins and needles"). They differ from "symptom terms" (STs), which are words and phrases reflecting common medical usage that can be mapped to a standard terminology (e.g., "vertigo").

  2. The "soldier mindset" is a mode of thinking where reasoning is like a soldier defending a position; beliefs are treated as strong fortresses to be buttressed, and opposing arguments are to be shot down. The "scout mindset" is a mode of thinking where the goal is not to attack or defend but to see what is actually there and form an accurate map of reality.

  3. The "reasonable person standard" is a fictitious legal standard used to evaluate behavior in accident cases. A person's conduct is compared to what an ordinary, prudent person would have done in the same situation to determine if their actions were negligent.

  4. The study found that nearly one-third (31%) of subjective symptom expressions were not coded with ICD-9-CM or restated in standard terminology. This highlights the limitations of current automated methods and the need to develop natural language processing (NLP) to extract clinically meaningful information that is otherwise unobtainable.

  5. The ideological Turing test is an exercise where you try to argue for an opposing viewpoint so convincingly that others cannot tell you don't actually hold that belief. Cognitively, it tests if you truly understand the other side's views; emotionally, it is an exercise in separating identity from belief by forcing you to explain opposing positions without caricature or condescension.

  6. Three key techniques for active listening are: making eye contact without staring, focusing completely on what is being said without distraction, and repeating what you heard for confirmation to reduce misunderstandings. Other techniques include allowing pauses, asking clarifying questions, and observing body language.

  7. Subjective language is woven with personal opinions, feelings, and judgments, such as "The cake is delicious." In contrast, objective language presents facts and information without the influence of personal feelings or interpretations, such as "The cake weighs 500 grams."

  8. Cross-cultural education programs are developed to enhance health professionals' awareness of how cultural and social factors influence healthcare. They provide methods to obtain, negotiate, and manage this information clinically to help reduce healthcare disparities.

  9. The "Acktually" meme is used to poke fun at someone who is being an annoying "know-it-all" by pointing out another person's error, especially a minor one. "Actually" is intentionally misspelled as "acktually" or "ackchyually" so the reader hears the phlegmy, stereotypical accent associated with nerd culture.

  10. The theory of "rational irrationality" posits that it is sometimes instrumentally rational (effective for achieving one's goals) to be epistemically irrational (to see things inaccurately). The theory claims humans evolved to hold false beliefs when those beliefs are more useful for achieving goals like social belonging or feeling good, rather than having accurate beliefs.


--------------------------------------------------------------------------------

Essay Questions

Instructions: The following questions are designed for longer-form answers that require synthesis of information across the provided sources. Do not provide answers.

  1. Analyze the challenges healthcare providers face in accurately capturing and interpreting patient experiences. Synthesize concepts from the "Sitting on Pins andNeedles" study on subjective symptom expressions, the principles of "active listening," and the definitions of "subjective language" and "cross-cultural education."

  2. Discuss how the "soldier mindset," as described by Julia Galef, could contribute to the healthcare disparities outlined in the Unequal Treatment text, particularly regarding provider bias. How might adopting a "scout mindset" help mitigate these issues?

  3. Explore the role of language in shaping perceived reality across clinical, legal, and social contexts. Use examples from the discussion of symptom reporting, the "reasonable person standard," and the analysis of subjective language in journalism and everyday communication.

  4. The study on subjective symptom expressions concludes that 31% of these expressions are "lost in translation" by current systems. Connect this loss of information to the discussion in Unequal Treatment regarding patient-provider communication barriers. How do these distinct but related failures impact patient care, especially for minority populations?

  5. Explain how the "rational irrationality" hypothesis can be seen at work in the defensive reactions that make the "ideological Turing test" so difficult and that lead people to be mocked with the "acktually" meme. How do our motivations to protect our beliefs and social standing (instrumental rationality) interfere with our ability to see things clearly (epistemic rationality)?


--------------------------------------------------------------------------------

Glossary of Key Terms

Term

Definition

Active Listening

A structured way of listening and responding to others where attention is focused on the other person to understand, interpret, and evaluate what they are telling you, without judgment.

"Acktually" Meme

A meme featuring a nerdy character saying a deliberate misspelling of "actually" to mock someone who is correcting another person in an annoying, "know-it-all" manner.

Annotation Studies

A research method where human coders systematically label text to reflect varying degrees of a specific quality, such as subjectivity, to create data for training analytical models.

Attractive Nuisance Doctrine

A legal doctrine in property law that requires property owners to keep their property safe for trespassing children.

Cross-Cultural Education

Training programs designed to enhance health professionals' awareness of how cultural and social factors influence healthcare, while providing methods to obtain, negotiate, and manage this information clinically.

Epistemic Rationality

The goal of seeing things as accurately as possible; forming beliefs that correspond to reality. Aligned with the scout mindset.

Health Disparities

Differences in health status and healthcare that exist among specific population groups, often linked to social, economic, and/or environmental disadvantage.

ICD-9-CM

The International Classification of Diseases, Ninth Revision, Clinical Modification; a coding system used to assign codes to diagnoses and procedures associated with hospital utilization.

Ideological Turing Test

An exercise to test understanding of an opposing viewpoint by attempting to argue for that side so convincingly that other people cannot tell whether you actually believe those views.

Instrumental Rationality

The goal of achieving one's goals as effectively as possible, whatever those goals might be (e.g., being happy, getting rich, social belonging).

Institutionalized Racism

Differential access to goods, services, and opportunities of society by race. It is structural, codified in institutions of custom, practice, and law, and does not require an identifiable perpetrator.

Internalized Racism

Acceptance by members of stigmatized races of negative messages about their own abilities and intrinsic worth.

Motivated Reasoning

An unconscious tendency to process information in a way that suits some end or goal, often to support pre-existing beliefs or desires. Associated with the soldier mindset.

Natural Language Processing (NLP)

A field of artificial intelligence and computer science focused on enabling computers to understand, interpret, and manipulate human language from text or speech data.

Neckbeard

A term for an unkempt, overgrown beard that primarily sits on the neck, often used in online culture to describe a stereotypical internet user or "know-it-all."

Negligence

The failure to act with the level of care that a reasonable person would have exercised under the same circumstances.

Negligence Per Se

A legal doctrine where an act is considered negligent because it violates a statute or regulation.

Objective Language

Language that presents facts and information without the influence of personal feelings, biases, or interpretations.

Rational Irrationality

A theory that it can be rational to be irrational; specifically, that it can be effective for achieving one's goals (instrumental rationality) to hold inaccurate beliefs (epistemic irrationality).

Reasonable Person Standard

A legal concept that defines a hypothetical person in society who exercises average care, skill, and judgment in conduct and serves as a comparative standard for determining liability for negligence.

Scout Mindset

A metaphor for a mode of thinking in which the primary goal is not to attack or defend beliefs, but to go out, see what is actually there, and form an accurate map of reality.

Soldier Mindset

A metaphor for a mode of thinking in which reasoning is used like a soldier defending a position; beliefs are treated as territory to be defended and opposing arguments are enemies to be defeated.

Somatic Experiences

A category of symptoms defined as expressions that refer to a body part, substance, function, or sensation, such as pain, numbness, or general malaise.

Subjective Language

Language characterized by words and phrases that reflect personal biases, preferences, opinions, feelings, or interpretations rather than indisputable facts.

Subjective Symptom Expression (SSE)

Phrases in clinical notes that entirely or partially capture the voice of the patient when describing symptoms, including figures of speech, idioms, or lay terms.

Symptom Term (ST)

Words and phrases in clinical notes that reflect common medical usage and can be mapped to a standard terminology.

Study Guide

This guide is designed to review and synthesize concepts from the provided sources, covering topics in communication, cognitive science, healthcare, and law. It includes a short-answer quiz with an answer key, a set of essay questions for deeper analysis, and a comprehensive glossary of key terms.


--------------------------------------------------------------------------------

Quiz: Short-Answer Questions

Instructions: Answer the following questions in two to three sentences, drawing only from the information provided in the source context.

  1. What are "subjective symptom expressions" (SSEs) as defined in the "Sitting on Pins and Needles" study, and how do they differ from "symptom terms" (STs)?

  2. According to Julia Galef, what is the core difference between the "soldier mindset" and the "scout mindset"?

  3. Define the "reasonable person standard" and explain its purpose in personal injury law.

  4. The "Sitting on Pins and Needles" study found that a significant percentage of SSEs were not captured by standard medical terminology or coding. What was this percentage, and what does this finding suggest about current data extraction methods in healthcare?

  5. What is the "ideological Turing test," and what is its goal as both a cognitive and emotional exercise?

  6. List three key techniques for "active listening" as outlined by the CCOHS.

  7. How does the Linguanaut article differentiate between subjective and objective language? Provide an example of each.

  8. According to the Unequal Treatment text, what are cross-cultural education programs designed to do for health professionals?

  9. The "Acktually" meme is used to mock a specific type of online behavior. Describe this behavior and the reason for the intentional misspelling of "actually."

  10. What is the concept of "rational irrationality" as described in the Julia Galef interview?


--------------------------------------------------------------------------------

Answer Key

  1. "Subjective symptom expressions" (SSEs) are defined as phrases that entirely or partially capture the patient's voice when describing symptoms, including figures of speech, idioms, or lay terms (e.g., "sitting on pins and needles"). They differ from "symptom terms" (STs), which are words and phrases reflecting common medical usage that can be mapped to a standard terminology (e.g., "vertigo").

  2. The "soldier mindset" is a mode of thinking where reasoning is like a soldier defending a position; beliefs are treated as strong fortresses to be buttressed, and opposing arguments are to be shot down. The "scout mindset" is a mode of thinking where the goal is not to attack or defend but to see what is actually there and form an accurate map of reality.

  3. The "reasonable person standard" is a fictitious legal standard used to evaluate behavior in accident cases. A person's conduct is compared to what an ordinary, prudent person would have done in the same situation to determine if their actions were negligent.

  4. The study found that nearly one-third (31%) of subjective symptom expressions were not coded with ICD-9-CM or restated in standard terminology. This highlights the limitations of current automated methods and the need to develop natural language processing (NLP) to extract clinically meaningful information that is otherwise unobtainable.

  5. The ideological Turing test is an exercise where you try to argue for an opposing viewpoint so convincingly that others cannot tell you don't actually hold that belief. Cognitively, it tests if you truly understand the other side's views; emotionally, it is an exercise in separating identity from belief by forcing you to explain opposing positions without caricature or condescension.

  6. Three key techniques for active listening are: making eye contact without staring, focusing completely on what is being said without distraction, and repeating what you heard for confirmation to reduce misunderstandings. Other techniques include allowing pauses, asking clarifying questions, and observing body language.

  7. Subjective language is woven with personal opinions, feelings, and judgments, such as "The cake is delicious." In contrast, objective language presents facts and information without the influence of personal feelings or interpretations, such as "The cake weighs 500 grams."

  8. Cross-cultural education programs are developed to enhance health professionals' awareness of how cultural and social factors influence healthcare. They provide methods to obtain, negotiate, and manage this information clinically to help reduce healthcare disparities.

  9. The "Acktually" meme is used to poke fun at someone who is being an annoying "know-it-all" by pointing out another person's error, especially a minor one. "Actually" is intentionally misspelled as "acktually" or "ackchyually" so the reader hears the phlegmy, stereotypical accent associated with nerd culture.

  10. The theory of "rational irrationality" posits that it is sometimes instrumentally rational (effective for achieving one's goals) to be epistemically irrational (to see things inaccurately). The theory claims humans evolved to hold false beliefs when those beliefs are more useful for achieving goals like social belonging or feeling good, rather than having accurate beliefs.


--------------------------------------------------------------------------------

Essay Questions

Instructions: The following questions are designed for longer-form answers that require synthesis of information across the provided sources. Do not provide answers.

  1. Analyze the challenges healthcare providers face in accurately capturing and interpreting patient experiences. Synthesize concepts from the "Sitting on Pins andNeedles" study on subjective symptom expressions, the principles of "active listening," and the definitions of "subjective language" and "cross-cultural education."

  2. Discuss how the "soldier mindset," as described by Julia Galef, could contribute to the healthcare disparities outlined in the Unequal Treatment text, particularly regarding provider bias. How might adopting a "scout mindset" help mitigate these issues?

  3. Explore the role of language in shaping perceived reality across clinical, legal, and social contexts. Use examples from the discussion of symptom reporting, the "reasonable person standard," and the analysis of subjective language in journalism and everyday communication.

  4. The study on subjective symptom expressions concludes that 31% of these expressions are "lost in translation" by current systems. Connect this loss of information to the discussion in Unequal Treatment regarding patient-provider communication barriers. How do these distinct but related failures impact patient care, especially for minority populations?

  5. Explain how the "rational irrationality" hypothesis can be seen at work in the defensive reactions that make the "ideological Turing test" so difficult and that lead people to be mocked with the "acktually" meme. How do our motivations to protect our beliefs and social standing (instrumental rationality) interfere with our ability to see things clearly (epistemic rationality)?


--------------------------------------------------------------------------------

Glossary of Key Terms

Term

Definition

Active Listening

A structured way of listening and responding to others where attention is focused on the other person to understand, interpret, and evaluate what they are telling you, without judgment.

"Acktually" Meme

A meme featuring a nerdy character saying a deliberate misspelling of "actually" to mock someone who is correcting another person in an annoying, "know-it-all" manner.

Annotation Studies

A research method where human coders systematically label text to reflect varying degrees of a specific quality, such as subjectivity, to create data for training analytical models.

Attractive Nuisance Doctrine

A legal doctrine in property law that requires property owners to keep their property safe for trespassing children.

Cross-Cultural Education

Training programs designed to enhance health professionals' awareness of how cultural and social factors influence healthcare, while providing methods to obtain, negotiate, and manage this information clinically.

Epistemic Rationality

The goal of seeing things as accurately as possible; forming beliefs that correspond to reality. Aligned with the scout mindset.

Health Disparities

Differences in health status and healthcare that exist among specific population groups, often linked to social, economic, and/or environmental disadvantage.

ICD-9-CM

The International Classification of Diseases, Ninth Revision, Clinical Modification; a coding system used to assign codes to diagnoses and procedures associated with hospital utilization.

Ideological Turing Test

An exercise to test understanding of an opposing viewpoint by attempting to argue for that side so convincingly that other people cannot tell whether you actually believe those views.

Instrumental Rationality

The goal of achieving one's goals as effectively as possible, whatever those goals might be (e.g., being happy, getting rich, social belonging).

Institutionalized Racism

Differential access to goods, services, and opportunities of society by race. It is structural, codified in institutions of custom, practice, and law, and does not require an identifiable perpetrator.

Internalized Racism

Acceptance by members of stigmatized races of negative messages about their own abilities and intrinsic worth.

Motivated Reasoning

An unconscious tendency to process information in a way that suits some end or goal, often to support pre-existing beliefs or desires. Associated with the soldier mindset.

Natural Language Processing (NLP)

A field of artificial intelligence and computer science focused on enabling computers to understand, interpret, and manipulate human language from text or speech data.

Neckbeard

A term for an unkempt, overgrown beard that primarily sits on the neck, often used in online culture to describe a stereotypical internet user or "know-it-all."

Negligence

The failure to act with the level of care that a reasonable person would have exercised under the same circumstances.

Negligence Per Se

A legal doctrine where an act is considered negligent because it violates a statute or regulation.

Objective Language

Language that presents facts and information without the influence of personal feelings, biases, or interpretations.

Rational Irrationality

A theory that it can be rational to be irrational; specifically, that it can be effective for achieving one's goals (instrumental rationality) to hold inaccurate beliefs (epistemic irrationality).

Reasonable Person Standard

A legal concept that defines a hypothetical person in society who exercises average care, skill, and judgment in conduct and serves as a comparative standard for determining liability for negligence.

Scout Mindset

A metaphor for a mode of thinking in which the primary goal is not to attack or defend beliefs, but to go out, see what is actually there, and form an accurate map of reality.

Soldier Mindset

A metaphor for a mode of thinking in which reasoning is used like a soldier defending a position; beliefs are treated as territory to be defended and opposing arguments are enemies to be defeated.

Somatic Experiences

A category of symptoms defined as expressions that refer to a body part, substance, function, or sensation, such as pain, numbness, or general malaise.

Subjective Language

Language characterized by words and phrases that reflect personal biases, preferences, opinions, feelings, or interpretations rather than indisputable facts.

Subjective Symptom Expression (SSE)

Phrases in clinical notes that entirely or partially capture the voice of the patient when describing symptoms, including figures of speech, idioms, or lay terms.

Symptom Term (ST)

Words and phrases in clinical notes that reflect common medical usage and can be mapped to a standard terminology.

Research Proposal: Automated Extraction and Analysis of Subjective Symptom Expressions from Clinical Notes

1.0 Introduction and Problem Statement

The patient's narrative is a critical component of clinical practice. The elicitation and assessment of symptoms—the subjective experiences reported by patients in their own words—are cornerstones of the patient-clinician interaction. This information is essential for accurate diagnosis, effective therapeutic decision-making, and the ongoing assessment of disease severity and treatment response. However, in the clinical process of translating these patient narratives into standardized medical terminology, a wealth of nuanced information is irretrievably lost.

This proposal focuses on the automated analysis of Subjective Symptom Expressions (SSEs). Defined as the phrases that capture the "voice of the patient," SSEs include figures of speech, idioms, and lay terms used to describe symptoms. These expressions provide highly individualized details about the nature, severity, and functional impact of a patient's experience. For example, an SSE like "my head feels like it was hit by a sledgehammer" conveys a level of severity and qualitative experience far beyond the standard clinical term "severe headache." These rich descriptions offer a deeper, more personalized understanding of the patient's condition.

The central problem this research addresses is that a significant volume of this clinically meaningful data is documented exclusively in unstructured, free-text clinical notes, rendering it inaccessible to automated analysis and large-scale research. A foundational study by Forbush et al. (2013) on clinical notes from the Veterans Affairs electronic health record (EHR) quantified this information loss, finding that nearly one-third (31%) of all identified SSEs were not captured by standard terminologies or billing codes like ICD-9-CM.

This information gap represents a significant missed opportunity to fully understand the patient experience and improve clinical care. By developing automated methods to extract and analyze SSEs, this research will unlock a vast, underutilized data source, transforming subjective patient narratives into the structured, quantifiable data required to generate a more accurate and complete patient phenotype.

2.0 Background and Literature Review

A review of existing literature reveals a critical disconnect between how patients express their symptoms and how this information is captured and utilized by healthcare systems, highlighting a clear need for advanced technological solutions. Current data capture methods fail to represent the full richness of the patient narrative, a problem that Natural Language Processing (NLP) is uniquely positioned to solve.

Characterization and Prevalence of Subjective Language in Clinical Notes

The most comprehensive characterization of SSEs in clinical notes comes from the 2013 study by Forbush et al. The study performed a manual annotation of 750 clinical notes from the Veterans Affairs EHR, identifying a total of 543 distinct SSEs. These expressions were semantically categorized, with 66.5% classified as mental/behavioral experiences and 33.5% as somatic experiences. The prevalence of these expressions was found to be significantly higher in mental health notes (38.0% of notes) compared to primary/specialty care notes (18.1%), underscoring their importance in specialties where the patient's subjective experience is the primary focus of the clinical encounter.

Table 1: Prevalence of Subjective Symptom Expressions (SSE) and Symptom Terms (ST) by Note Type

Note Type

Total Notes

Notes with SSEs

Prevalence of SSEs (% of Notes in Category)

Mean SSE per Note

Mean ST per Note

Mental/Social

171

65

38.0%

1.25

8.74

Primary/Specialty

579

105

18.1%

0.57

6.14

Total

750

170

Source: Adapted from Forbush et al., 2013

The Failure of Current Data Capture Methods

The most common technique for extracting detailed symptom information from patient records is manual chart review; however, this method is expensive, time-consuming, and not scalable for large-scale analysis. The alternative—relying on structured data such as billing codes—proves to be a profoundly inadequate proxy for the patient's experience. The Forbush et al. analysis powerfully illustrates this inadequacy. Of the 543 SSEs identified, only two—both describing headache—were explicitly captured by an ICD-9-CM code. Even when considering SSEs that were restated in standard clinical terms by the provider within the note, a significant portion remained uncaptured. As shown in the table below, nearly one-third (31%) of all SSEs were not captured by diagnostic codes nor restated in standard clinical terms within the note, representing a substantial loss of clinically relevant information.

Table 2: Distribution of Subjective Symptom Expressions (SSE) Contained by ICD-9 or Restated in Standard Terms

SSE Contained by ICD-9: Yes

SSE Contained by ICD-9: No

Total

SSE Restated in Standard Terms: Yes

114 (21%)

183 (34%)

297 (55%)

SSE Restated in Standard Terms: No

79 (15%)

167 (31%)

246 (45%)

Total

193 (36%)

350 (64%)

543

Source: Adapted from Forbush et al., 2013. The 31% of uncaptured SSEs are in the bottom-right quadrant.

The Promise of Natural Language Processing (NLP)

Natural Language Processing offers the most viable solution to efficiently and accurately extract symptom reports from unstructured electronic records. Clinical notes within EHRs contain detailed descriptions of symptoms, treatment responses, and other critical information that is unavailable in structured formats. NLP techniques, specifically subjectivity analysis, sentiment analysis, and opinion mining, are directly applicable to the challenge of identifying and interpreting the patient's voice in clinical text. These methods can be adapted to systematically extract SSEs at a scale that is impossible with manual review, providing a pathway to transform these rich narratives into analyzable data.

While the literature clearly defines the problem and points toward a technological solution, no validated, scalable model currently exists to perform this task. The following research aims are designed to develop and rigorously evaluate such a model.

3.0 Research Aims and Objectives

This proposal directly addresses the information gap created by the failure of current systems to capture subjective patient narratives. The project's aims are designed to move beyond characterizing the problem to developing a scalable, automated solution with clear clinical relevance. By leveraging advances in Natural Language Processing, we will create a tool capable of transforming the "voice of the patient" into structured, analyzable data.

  1. Aim 1: To characterize the prevalence and linguistic features of Subjective Symptom Expressions (SSEs) in a large-scale clinical note corpus.

    • Objective 1.1: Systematically annotate a corpus of clinical notes to create a reference standard for SSEs, following a rigorous methodology for inter-annotator agreement.

    • Objective 1.2: Classify all identified SSEs into semantic groups (e.g., Somatic, Mental/Behavioral) and sub-categories (e.g., Emotional Distress, Altered Sensation) based on the framework established in prior research.

  2. Aim 2: To develop and validate a novel Natural Language Processing (NLP) model for the automated extraction and classification of SSEs.

    • Objective 2.1: Train a transformer-based named entity recognition model on the annotated corpus to accurately identify SSEs in unstructured text.

    • Objective 2.2: Evaluate the model's performance against the human-annotated reference standard using standard metrics (Precision, Recall, F1-score).

  3. Aim 3: To assess the clinical value of automatically extracted SSEs.

    • Objective 3.1: Analyze the degree to which automatically extracted SSEs are represented by structured data (ICD codes) within the same patient encounter.

    • Objective 3.2: Explore the potential for using SSEs to better understand symptom severity and impact on patient function, as described in the source literature.

This structured approach will allow us to first build a robust foundation of human-annotated data and then use it to create and validate a powerful new analytical tool.

4.0 Proposed Methodology

The research plan will proceed in multiple phases, beginning with the creation of a high-quality, human-annotated dataset. This "ground truth" corpus will then serve as the foundation for developing and rigorously evaluating a state-of-the-art NLP model designed to automate the extraction of Subjective Symptom Expressions.

Data Source and Study Population

This study will conduct a retrospective analysis of a large corpus of de-identified clinical notes from a diverse patient population. The dataset will be sourced from an institutional clinical data warehouse, similar in scope to the VA Corporate Data Warehouse used in the foundational study by Forbush et al., encompassing notes from primary care, specialty care, and mental health encounters.

Phase 1: Reference Standard Creation

A high-quality reference standard is essential for training and validating the NLP model. The creation process will follow a structured, multi-step methodology to ensure reliability and consistency.

  1. Annotation Guidelines: A formal set of guidelines will be developed to define SSEs, based on the operational definition from the source literature: phrases that entirely or partially capture the voice of the patient when describing symptoms, including figures of speech, idioms, or lay terms. The guidelines will include positive and negative examples to aid annotator consistency.

  2. Annotator Training: A team of reviewers with clinical experience will be recruited and trained. Training will be conducted using a dedicated set of notes (not included in the final corpus) to ensure all annotators can consistently apply the guidelines.

  3. Inter-Annotator Agreement (IAA): Documents will be randomly assigned to two independent reviewers. IAA will be calculated on a subset of the corpus to ensure reliability, targeting a score above 80%, which was consistently achieved in the Forbush et al. study.

  4. Adjudication: Any disagreements between the two primary annotators will be resolved by a third, senior reviewer. This adjudicated set of annotations will form the final, gold-standard reference for the project.

Phase 2: NLP Model Development

Using the reference standard created in Phase 1, we will develop a transformer-based named entity recognition (NER) model. This state-of-the-art architecture is well-suited for identifying and extracting specific spans of text, such as SSEs, from unstructured documents. The annotated corpus will be partitioned into training, validation, and testing sets to develop the model and tune its parameters without introducing bias into the final evaluation.

Phase 3: Evaluation and Analysis

The final phase will focus on rigorously evaluating the NLP model and analyzing its output to meet the project's research aims.

  • Model Performance: The model's output on the held-out test set will be compared against the human-annotated reference standard. Performance will be measured and reported using standard metrics for information extraction tasks: Precision, Recall, and F1-score.

  • Clinical Data Linkage: To assess the added clinical value, the SSEs automatically extracted by the model will be cross-referenced with structured data, specifically ICD diagnosis codes, from the same clinical encounters. This will allow us to quantify the information gain provided by the NLP model over existing structured data sources.

This systematic methodology ensures the creation of a robust dataset, the development of a high-performing model, and a clear evaluation of its potential clinical utility.

5.0 Significance and Potential Impact

By developing a method to unlock the rich narrative data embedded within clinical notes, this project has the potential to generate significant advancements in clinical care, health equity, and biomedical research. The ability to systematically capture and analyze the patient's subjective experience at scale will provide a more complete picture of health and disease, moving us closer to a truly patient-centered healthcare system.

  • Improving Clinical Understanding Systematically capturing SSEs can provide clinicians with a deeper, more nuanced understanding of the patient's experience. Expressions related to symptom severity ("It feels like I’m being stabbed in the chest by a hot knife") and impact on function ("I never make it to the bathroom on time") contain vital information that is often lost in translation to standard medical terms. Access to this data can lead to improved diagnostic accuracy, more personalized treatment plans, and better monitoring of quality-of-life measures.

  • Advancing Health Equity This technology has the potential to help mitigate health disparities. Research has shown that communication gaps between patients and providers are a critical factor contributing to unequal treatment. By ensuring the "voice of the patient" is captured and represented in a structured format, especially for patients facing cultural or linguistic barriers, NLP tools can help bridge these gaps. Making this subjective information more visible may help providers better understand the unique experiences and needs of diverse patient populations.

  • Enabling Large-Scale Research The NLP tool and annotated dataset created through this project will serve as invaluable resources for the broader research community. These assets will enable large-scale retrospective studies on symptom presentation, patient experience, and treatment outcomes that are currently impossible with costly and time-consuming manual chart review. This will open new avenues of inquiry into the natural history of disease and the real-world effectiveness of different therapeutic interventions.

In conclusion, this project's central goal is to develop and validate an NLP-driven approach that transforms subjective patient narratives into structured, analyzable data. By doing so, we aim to generate a more complete and accurate "phenotype" of the patient, incorporating not just what clinicians observe, but what patients themselves experience and report.

Thanks for reading! Subscribe for free to receive new posts and support my work.

Thanks for reading! This post is public so feel free to share it.

Share

Share steven owens

Leave a comment

Discussion about this episode

User's avatar

Ready for more?