The garden's forking paths are a metaphor for the infinite diverging narratives of Ts'ui Pen's novel, which is in itself an allegory of time.Doesn't the scholarly work that takes place in the gardens of Ts'ui Pen and Dr. Albert suggest that universal truth transcends time? .
The Garden of Forking Paths Summary
He must escape from Captain Richard Madden, the Irishman who has murdered his co-conspirator in espionage, and complete his mission by delivering the location of a secret cache of British weapons to his boss in Germany, whom he refers to as The Chief.He checks the contents of his pockets – revealing a revolver with only one bullet – locates the address of the one person capable of passing on his missive, and runs to catch a train to the suburbs.Dr. Albert tells Tsun the story of his ancestor, Ts'ui Pen, a former governor who abandoned his political position to write a novel and build a labyrinth, or maze.Seeing Captain Madden approach, Yu Tsun expresses his gratitude to Dr.
Albert for resolving the mystery of Ts'ui Pen's garden, then shoots him in the back. .
The Garden of Forking Paths
The Garden of Forking Paths by Jorge Luis Borges Collection first edition Original title "El jardín de senderos que se bifurcan" Translator Anthony Boucher Country Argentina Language Spanish Genre(s) Spy fiction, war fiction Published in El Jardín de senderos que se bifurcan (1941)."The Garden of Forking Paths" (original Spanish title: "El jardín de senderos que se bifurcan") is a 1941 short story by Argentine writer and poet Jorge Luis Borges.It is the title story in the collection El jardín de senderos que se bifurcan (1941), which was republished in its entirety in Ficciones (Fictions) in 1944.It was the first of Borges's works to be translated into English by Anthony Boucher when it appeared in Ellery Queen's Mystery Magazine in August 1948.Borges's vision of "forking paths" has been cited as inspiration by numerous new media scholars, in particular within the field of hypertext fiction.As the story begins, Doctor Tsun has realized that an MI5 agent called Captain Richard Madden is pursuing him, has entered the apartment of his handler, Viktor Runeberg, and has either captured or killed him.Doctor Tsun is, therefore, determined to be more intelligent than any White spy and to obtain the information Nicolai needs to save the lives of German soldiers.Doctor Tsun suspects that Captain Madden, an Irish Catholic in the employ of the British Empire, is similarly motivated.Narrowly avoiding the pursuing Captain Madden at the railway station, he goes to the house of Doctor Stephen Albert, an eminent Sinologist.As he walks up the road to Doctor Albert's house, Doctor Tsun reflects on his great ancestor, Ts'ui Pên, a learned and famous civil servant who renounced his post as governor of Yunnan Province to undertake two tasks: write a vast and intricate novel and construct an equally-vast and intricate labyrinth "in which all men would lose their way.".Ts'ui Pên was murdered before he could complete his novel, however, and wrote a "contradictory jumble of irresolute drafts" that made no sense to subsequent readers, and the labyrinth was never found.In homage to the story, the TV series FlashForward made an episode entitled "The Garden of Forking Paths".Hap shares his hypothesis, opposed to Leon's, about multiple dimensions citing “a garden of forking paths” used by his subjects.Hap shares his hypothesis, opposed to Leon's, about multiple dimensions citing “a garden of forking paths” used by his subjects. .
The Statistical Crisis in Science
The idea is that when p is less than some prespecified value such as 0.05, the null hypothesis is rejected by the data, allowing researchers to claim strong evidence in favor of the alternative.The concept of p-values was originally developed by statistician Ronald Fisher in the 1920s in the context of his research on crop variance in Hertfordshire, England.And so on: A single overarching research hypothesis—in this case, the idea that issue context interacts with political partisanship to affect mathematical problem-solving skills—corresponds to many possible choices of a decision variable.This multiple comparisons issue is well known in statistics and has been called “p-hacking” in an influential 2011 paper by the psychology researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn.This error carries particular risks in the context of small effect sizes, small sample sizes, large measurement errors, and high variation (which combine to give low power, hence less reliable results even when they happen to be statistically significant, as discussed by Katherine Button and her coauthors in a 2013 paper in Nature Reviews: Neuroscience).Realistically, though, a researcher will come into a study with strong substantive hypotheses, to the extent that, for any given data set, the appropriate analysis can seem evidently clear.In 2013, a research group led by Michael Petersen of Aarhus University published a study that claimed to find an association between men’s upper-body strength, interacted with socioeconomic status, and their attitudes about economic redistribution.These researchers had enough degrees of freedom for them to be able to find any number of apparent needles in the haystack of their data—and, again, it would be easy enough to come across the statistically significant comparisons without “fishing” by simply looking at the data and noticing large differences that are consistent with their substantive theory.What they found was that the correlation of arm circumference with opposition to redistribution of wealth was higher among men of high socioeconomic status.In conducting follow-up validations, the researchers found that some of the Danish questions worked differently when answered by Americans, and further explain: “When these two unreliable items are removed … the interaction effect becomes significant.In 2013, psychologists Brian Nosek, Jeffrey Spies, and Matt Motyl posted an appealing example of prepublication replication in one of their own studies, in which they performed an experiment on perceptual judgment and political attitudes, motivated and supported by substantive theory.But rather than stopping there, declaring victory, and publishing these results, they gathered a large new sample and performed a replication with predetermined protocols and data analysis.Unwelcome though it may be, the important moral of the story is that the statistically significant p-value cannot be taken at face value—even if it is associated with a comparison that is consistent with an existing theory.A much-discussed example of possibly spurious statistical significance is the 2011 claim of Daryl Bem, an emeritus professor of social psychology at Cornell University, to have found evidence for extrasensory perception (ESP) in college students.After some failed attempts at replications, the furor has mostly subsided, but this case remains of interest as an example of how investigators can use well-accepted research practices to find statistical significance anywhere.Bem’s paper presented nine different experiments and many statistically significant results—multiple degrees of freedom that allowed him to keep looking until he could find what he was searching for.But consider all the other comparisons he could have drawn: If the subjects had identified all images at a rate statistically significantly higher than chance, that certainly would have been reported as evidence of ESP.For example, consider the statement about “anomalous precognitive physiological arousal.” Suppose that the experimental subjects had performed statistically significantly worse for the erotic pictures.Rather, it would be seen as a natural implication of the research hypothesis, because there is a considerable amount of literature suggesting sex differences in response to visual erotic stimuli.In 2013, psychologists Kristina Durante, Ashley Rae, and Vladas Griskevicius published a paper based on survey data claiming that “Ovulation led single women to become more liberal, less religious, and more likely to vote for Barack Obama.After all, these researchers found a large effect that was consistent with their theory, so why quibble if the significance level was somewhat overstated because of multiple comparisons problems?First, the claimed effect size, in the range of a 20 percentage point difference in vote intention at different phases of the menstrual cycle, is substantively implausible, given all the evidence from polling that very few people change their vote intentions during presidential general election campaigns (a well-known finding that Gelman and colleagues recently confirmed with a panel survey from the 2012 presidential election campaign).In addition to the choice of main effects or interactions, Durante and her collaborators had several political questions to work with (attitudes as well as voting intentions), along with other demographic variables (age, ethnicity, and parenthood status) and flexibility in characterizing relationship status (at one point, “single” versus “married,” but later, “single” versus “in a committed relationship”).Another study, also published in a top psychology journal, exhibits several different forms of multiplicity of choices in data analysis.The researchers’ theory, they wrote, was “based on the idea that red and shades of red (such as the pinkish swellings seen in ovulating chimpanzees, or the pinkish skin tone observed in attractive and healthy human faces) are associated with sexual interest and attractiveness.” In a critique published later that year in Slate, one of us (Gelman) noted that many different comparisons could have been reported in the data, so there was nothing special about a particular comparison being statistically significant.Tracy and Beall responded on the website of their Emotion and Self Lab at the University of British Columbia that they had conducted their studies “with the sole purpose of testing one specific hypothesis: that conception risk would increase women’s tendency to dress in red or pink”—a hypothesis that they saw as emerging clearly from a large body of work, which they cited.Even though Beall and Tracy did an analysis that was consistent with their general research hypothesis—and we take them at their word that they were not conducting a “fishing expedition”—many degrees of freedom remain in their specific decisions: how strictly to set the criteria regarding the age of the women included, the hues considered as “red or shades of red,” the exact window of days to be considered high risk for conception, choices of potential interactions to examine, whether to combine or contrast results from different groups, and so on.But in the details they made different analytic choices, each time finding statistical significance with the comparisons they chose to focus on.Our own applied work is full of analyses that are contingent on data, yet we and our colleagues have been happy to report uncertainty intervals (and thus, implicitly, claims of statistical significance) without concern for selection bias or multiple comparisons.Working scientists are also keenly aware of the risks of data dredging, and they use confidence intervals and p-values as a tool to avoid getting fooled by noise.In the case of the ESP experiments, a phenomenon with no real theoretical basis was investigated with a sequence of studies designed to reveal small effects.In political science, Humphreys and his coauthors recommend preregistration: defining the entire data-collection and analysis protocol ahead of time.One can follow up an open-ended analysis with prepublication replication, which is related to the idea of external validation, popular in statistics and computer science.In fields where new data can readily be gathered, perhaps the two-part structure of Nosek and his colleagues—attempting to replicate his results before publishing—will set a standard for future research. .
Navigating the garden of forking paths for data exclusions in fear
The reviewers also outlined some important suggestions for strengthening the report, all of which can be addressed in a revision with a reasonable amount of effort by the authors.We have changed the title to “Navigating the garden of forking paths for data exclusions in fear conditioning research”.We grouped subjects based on all the different cutoffs in SCR CS+/CS- discrimination (listed in Appendix 2—table 1, section A in the revised version) as identified from the literature search.They then calculate for each group in this stratification whether or not the individuals show evidence of learning by looking at differential SCRs and fear ratings.From this, they conclude "Fourth, we provided empirical evidence that those classified as 'non-learners' (sometimes referred to as 'outliers') in SCRs based on the identified definitions ironically displayed significant CS+/CS- discrimination.".In the case of differential SCRs, this argument is circular, meaning that the group definitions are dependent upon the outcome measure.In addition, exclusion groups with a lower bound > 0 will by definition show significant differential conditioning using SCRs as an outcome.We have thus removed part A and B of the table (results of the ANOVAs testing for differences in SCR CS+/CS- discrimination between the exclusion groups).Yet, the fact that some or even most of these “exclusion groups” did show significant CS+/CS- discrimination in SCRs is a major problem in our opinion, as these individuals are excluded from analyses as ‘non-learners’ in the literature.We have made this point clearer in the Materials and methods section, the caption to Appendix 2—table 1 and revised the wording of our conclusions.Still, it is relevant to test whether all groups classified as ‘non-learners’ in the literature indeed fail to show evidence of learning, which would be indicated by a lack of significant CS+/CS- discrimination in SCRs in this case.In essence, this is a test to evaluate whether exclusion criteria used in the literature indeed achieve what they purport to do, that is, classify a group of participants that do not show evidence of learning.”.[…] Still, this is an important manipulation check to empirically test whether those classified as ‘non-learners’ in the literature indeed do not show evidence of learning, which would be indicated by comparable SCRs to the CS+ and the CS- (i.e., no significant discrimination).”.Strikingly and more importantly, with the exception of the group formed by defining ‘non-learning’ as a discrimination score of 0 µS (light green), all other exclusion groups, which have been established by defining classified as ‘non-learners’ by the different cutoffs in the literature, showed significantly larger CS+ than CS- SCR amplitudes (i.e., significant CS+/CS- discrimination; raw: all p’s < 0.0009; log, rc: all p’s < 0.0002; see Appendix 2 –Table 1 for details).”.Discussion: “Fourth, we provided empirical evidence that those classified as ‘non-learners’ in the literature (sometimes referred to as ‘outliers’) in SCRs based on the identified definitions ironically displayed significant CS+/CS- discrimination – with the exception of non-learners defined by a cut-off in differential responding of < 0 and = 0 µS.Hence, in addition to the many conceptual problems we raised here, the operationalization of ‘non-learning’ in the field failed the its critical manipulation check, given that those classified as ‘non-learners’ show clear evidence of learning as a group (i.e., CS+/CS- discrimination, see Appendix 2—table 1).”.This still leaves a lot of room for decisions that may result in a biased sample (such as those anxious individuals who also show pretty large responses to the CS-).Box 1, section B: “classification as SCR ‘non-learners’ should be based on differential scores (CS+ vs. CS-) and the number of trials included for this calculation should be clearly justified.Providing a generally valid recommendation regarding the number of trials to be included is difficult, since it critically depends on experimental design choices.”.With respect to the potential impact of startle on the fear learning process, in the revised manuscript, we now refer to our previous publication (Sjouwerman et al., 2016) that addressed this question empirically.I believe reduction in the overall length (perhaps partly through another careful round of editing to avoid any unnecessary redundancy) would strengthen the impact of the manuscript.If not, I wonder if it would be worth speculating about the meaning of a given researcher repeatedly uses the same paradigm with similar samples multiple times while changing the exclusionary criteria with no justification.The reviewers are absolutely correct to note that there is inconsistent use of criteria not only between but also within research groups and that typically no explicit justification for this is provided.Identifying specific individuals or groups for using inconsistent criteria across different publications (and speculating about the underlying reasons) would, in our view, not add anything to the message we want to be heard.7) Given the proliferation of "turn key" systems and rapid adoption of affordable, (e.g., mobile) devices for measuring of psychophysiological signals, many of which have not been sufficiently vetted for their reliability, I believe a slightly greater emphasis (not necessarily much more words but perhaps stronger language)on the potential differences in exclusion of participants based on specific amplifier systems used is warranted.“This being said, we do acknowledge that certain research questions or the use of different recording equipment (robust lab equipment vs. novel mobile devices such as smartwatches) may potentially require distinct data processing pipelines and potentially also exclusion of certain observations (Simmons, Nelson and Simonsohn, 2011; Silberzahn et al., 2018), hence it is not desirable to propose rigid and fixed rules for generic adoption.”.Relatedly, future studies need to empirically address which criteria for SCR transformation and exclusions are more or less sensitive to baseline differences (for an example from startle responding see Bradford et al., 2015; Grillon et al., 2002).”.“Therefore, participants are often (routinely) excluded from analyses if they appear to not have not learned (‘non-learners’) or not have responded been responsive to the experimental stimuli (‘non-responders’) during fear acquisition training”.Legend of Figure 1: “Examples of irrelevant topics included studies that did not use fear conditioning paradigms (see https://osf.io/uxdhk/ for a documentation of excluded publications).”.12) Some spots in the supplementary materials would benefit from just a bit of text to explain the figures, rather than referring back to the main paper (for example Appendix 4—table 2).[…] Still, this is an important manipulation check to empirically test whether those classified as ‘non-learners’ in the literature indeed do not show evidence of learning, which would be indicated by comparable SCRs to the CS+ and the CS- (i.e., no significant discrimination).”.Results: “The definitions differed in i) the stimulus type(s) used to define ‘non-responding’ (CS+ reinforced, CS+ unreinforced, all CS+s, CS-, US), ii) the SCR minimum amplitude criterion to define a responder (varying between 0.01 µS and 0.05 µS, or mere visual inspection), and iii) the percentage of trials for which these criteria have to be met (see Figure 6B and Appendix 1—figure 1) as well as a combination thereof.We inserted cross-references to the relevant figures, tables and the Appendix for every discussion point to facilitate comprehensibility and to act as reminders of the respective finding.Hence, in addition to the many conceptual problems we raised here, the operationalization of ‘non-learning’ in the field failed its critical manipulation check, given that those classified as ‘non-learners’ showed clear evidence of learning (i.e., CS+/CS- discrimination, see Appendix 2—table 1).”.7) In Box 1, General Reporting, under "recommendations for how to proceed" for "minimal response criterion (μS) to define a valid SCR," it would help to have more detail about the means by which an empirical cutoff can be determined.My recommendation would be for the authors to dramatically soften the conclusions in the aforementioned subsection and acknowledge this limitation of their approach in the Discussion section.Alternatively, if the authors had another method for estimating the number of subjects that each cutoff criteria likely misclassifies (e.g. perhaps by using the error variance of the entire sample as a stand-in for single-subject confidence intervals), these data could be used to support the conclusions mentioned above.".With respect to the alternative suggestion of the reviewer, we consider this rather difficult as the ground truth is unknown (what is a meaningful discrimination) and hence it is unclear how to define a misclassification.Note that this is somewhat circular as exclusion groups are defined by different SCR CS+/CS- cutoffs which then are used in an analysis where differential SCRs are the dependent measure.Still, this is an important manipulation check to empirically test whether those classified as a group of ‘non-learners’ in the literature indeed do not show evidence of learning, which would be indicated by comparable SCRs to the CS+ and the CS- (i.e., no significant discrimination).“Fourth, we provided empirical evidence that those classified as a group of ‘non-learners’ in SCRs in the literature (sometimes referred to as ‘outliers’) based on the identified definitions in fact displayed significant CS+/CS- discrimination when applied to our own data.[…] Hence, in addition to the many conceptual problems we raised here, the operationalization of ‘non-learning’ in the field failed its critical manipulation check given that those classified as ‘non-learners’ show clear evidence of learning as a group (i.e., CS+/CS- discrimination, see Appendix 2—table 1).”.