Methodology

How I scored the research.

Every source on the research page was scored against the same five-criteria rubric. The total runs 0 to 25. Only sources scoring 21 or higher made the cut. This page is the rubric, what each score band means, and why I drew the line where I did.

The five criteria

Each criterion is scored 1 to 5. The criteria are independent, and a source can be strong on one and weak on another. Total = sum of the five.

1

Credibility (Cred)

Peer-review status, author expertise, publisher reputation. A journal everyone in the field reads scores higher than a journal nobody has heard of. An author with a track record in body-focused behaviors scores higher than a one-paper outsider.

5 out of 5 looks like

Top-tier peer-reviewed journal, well-known authors with sustained work in BFRBs.

1 out of 5 looks like

Self-published, no peer review, no domain track record.
2

Methodology (Meth)

Study design, sample size, controls, transparency of methods. A randomized controlled trial with hundreds of participants and pre-registered methods scores higher than an observational study with twenty.

5 out of 5 looks like

RCT, large sample, pre-registered, blinding where possible, methods reproducible from the paper alone.

1 out of 5 looks like

Anecdote, single case study, methods not described.
3

Currency (Curr)

Recency of publication and continued relevance. A 2023 paper on a fast-moving question outranks a 2003 paper on the same question. For a stable mechanism (neurobiology of the basal ganglia, say), older work can still score 5 if the field has moved past it without invalidating it.

5 out of 5 looks like

Published in the last 3 years, or older but still cited as authoritative in current reviews.

1 out of 5 looks like

Predates major shifts in the field, contradicted by newer evidence.
4

Objectivity (Obj)

Disclosed funding, balanced presentation, absence of conflicts of interest. A trial funded by an independent grant scores higher than a trial funded by a company selling the intervention. Papers that present limitations honestly outrank papers that don't.

5 out of 5 looks like

Independent funding, full conflict-of-interest disclosure, limitations section that names real limitations.

1 out of 5 looks like

Industry-funded with no disclosure, framed to advocate for a product.
5

Specificity (Spec)

Direct relevance to nail biting, BFRBs, or the exact intervention discussed. A paper on nail biting interventions outranks a general "habit change" paper. A paper studying the technique the program uses scores higher than a paper on a different technique.

5 out of 5 looks like

Studies nail biting or BFRB intervention directly. Tests the specific technique the program uses.

1 out of 5 looks like

General behavioral psychology, no BFRB content, intervention not similar to the one in the app.

Score bands

Once a source has a total, I put it in one of four bands. The bands map to how the source gets used (or doesn't) in the program.

21-25

Strong

Use it. Cite it. Build features around it.
16-20

Adequate

Use with caveats. Caveat in the writeup. No features that depend on it alone.
11-15

Weak

Reference only. Background context, never load-bearing.
0-10

Insufficient

Excluded. Cited only to explain why it was excluded.

Why 21 is the cutoff

Twenty-one is roughly "strong on at least four of five criteria, with at most one moderate score." A source scoring 21 has methodological credibility on every dimension that matters. A source scoring 20 has at least one criterion where reasonable doubt exists.

Twenty was a tempting line because it would have given me about a dozen sources. The difference between 20 and 21 is the difference between "I can defend this against a careful skeptic" and "I'd flinch a little." For a page whose whole point is verifiability, the higher cutoff was the right one.

In practice, the 21+ band is small. The literature on body-focused repetitive behaviors is real and growing, but it's smaller than the literatures behind, say, depression treatment or sleep. Eight strong sources is what you get when you scope the question narrowly (nail biting and BFRB interventions) and apply a serious bar.

What got excluded

Plenty of papers came up in the review and didn't make the cut. A few examples of why:

Older meta-analyses with mixed methods. Strong on credibility and currency but weak on methodology when individual study quality varied widely. Scored in the high teens. Cited in the writeup as background, not as evidence.
Single-case studies and small pilots. Even when published in good journals, n=12 doesn't carry the same weight as n=300. Methodology scores capped low.
Industry-adjacent papers. A few of the most-cited "habit change" papers are written by authors who also sell habit-tracking products. I read them. They aren't on the source list.
General behavioral-change papers without BFRB specificity. Habit formation research is interesting and sometimes relevant, but a paper on snack-food habits doesn't directly speak to the motor-pattern automaticity that drives nail biting. Specificity scores below 3 disqualify.
Older reviews not yet updated. If a 2014 review's conclusions have been overtaken by newer trials, I read it for context but don't lean on it for what the program does today.

The point

A research review's value depends entirely on what it excluded. If I'd included every paper that came up in a Google Scholar search, the page would be longer and a lot less honest. The rubric is what makes the source list defensible: eight sources, scoring 21 or higher, with the methodology shown.

If a paper you'd expect to see is missing, it's almost certainly because it scored below 21. If you want to know why a specific paper was excluded, email me and I'll show you the score.

← Back to the research overview Get it on Google Play

The five criteria

Credibility (Cred)

Methodology (Meth)

Currency (Curr)

Objectivity (Obj)

Specificity (Spec)

Score bands

Strong

Adequate

Weak

Insufficient

Why 21 is the cutoff

What got excluded

The point