Explore input validation and sanitization in Scala as boundary-hardening practices, with emphasis on parsing, allow-lists, and context-specific output handling.
Input validation checks whether incoming data is acceptable for a given domain or protocol. Sanitization modifies or encodes data so it can be handled safely in a particular context.
These are related but not interchangeable. Validation decides whether the input is allowed. Sanitization helps control how already accepted input is rendered or passed into risky contexts.
Good validation is usually:
That means parsing raw strings into structured domain types as early as possible.
1final case class Username(value: String)
2
3def parseUsername(raw: String): Either[String, Username] =
4 Either.cond(raw.matches("[A-Za-z0-9_]{3,20}"), Username(raw), "Invalid username")
This is stronger than carrying raw strings through the system and hoping later layers handle them safely.
There is no universal “sanitize everything” function. Safe handling depends on where the value will go:
The same input may need very different treatment across those contexts. That is why parameterized queries and safe renderers are usually better than ad hoc string cleaning.
The strongest pattern is often to avoid dangerous string composition in the first place:
Manual escaping logic is brittle and easy to apply inconsistently.
Raw untrusted values cross several layers before constraints are enforced.
The team assumes one cleaning rule will make input safe everywhere, which is rarely true.
Even rejected values can create trouble if they are written directly into logs, metrics labels, or dashboards without care.
Validate early using domain-specific constraints, sanitize only for the exact output or execution context that needs it, and prefer structured safe APIs over manual escaping whenever possible.