Discussion questions#
Who gets to define what counts as “invalid”? Many semantic anomalies rely on constraints (e.g., age ranges, logical dependencies).
Who decides those constraints: the researcher, the dataset, the instrument, or the theory?
How might those decisions differ across domains (e.g., psychology vs. neuroscience vs. medicine)?
What happens when the constraint itself is wrong or outdated?
When is missing data a symptom of a larger issue for an observation or variable rather than an isolated omission, and what are ways to handle missing data? What properties of the data set (e.g., size, number of feature variables) might influence your solution?