The “Results are more `vampirical' than `empirical' -- unable to be
killed by mere evidence" (Freese, 2007)
This is from some slides by Andrew Gelman http://www.stat.columbia.edu/~gelman/presentations/ziff.pdf and refer to the tendency of results we’d like to believe are true to persist, despite evidence to the contrary.
Gelman cites Freese, but these are PowerPoint slides and I can’t find the full reference in them.
The statistical point is a nice one. If you are looking at effects that you think will be small (e.g. deviations in births from the normal sex ratio of slightly more males than females), then any effects which are “statistically significant” are likely to be huge overestimates of the real effect.
An easy way to see why this is so: suppose you are studying a coin to see whether it is fair or not. If you are only flipping it 6 times, the only way to get a significant result (i.e. less than a standard .05 significance level) is if it comes up heads or tails all 6 times (a probability of 2/64 = .03). But that result (“coin comes up heads/tails 100% of the time) is likely to be a huge overestimate even for a biased coin.
Making a related point (that repeated testing can result in estimates that are too big / conclusions that are wrong) is this XKCD comic: