pre-commit-hooks/pre_commit_hooks/remove_em_dash.py
Jakub J Jablonski aba8a7597a
Add remove-em-dash hook
New fixer hook that replaces UTF-8 em-dashes (U+2014) with a plain
hyphen (-), modeled on the trailing-whitespace hook.

- pre_commit_hooks/remove_em_dash.py: the fixer (binary-safe, UTF-8 only)
- tests/remove_em_dash_test.py: full coverage of fix and no-op cases
- registered in setup.cfg, .pre-commit-hooks.yaml, and README.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 17:12:13 +02:00

34 lines
832 B
Python

from __future__ import annotations
import argparse
from collections.abc import Sequence
EM_DASH = '\N{EM DASH}'.encode()
def _fix_file(filename: str) -> bool:
with open(filename, 'rb') as f:
contents = f.read()
new_contents = contents.replace(EM_DASH, b'-')
if new_contents == contents:
return False
with open(filename, 'wb') as f:
f.write(new_contents)
return True
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser()
parser.add_argument('filenames', nargs='*', help='Filenames to fix')
args = parser.parse_args(argv)
retv = 0
for filename in args.filenames:
if _fix_file(filename):
print(f'Fixing {filename}')
retv = 1
return retv
if __name__ == '__main__':
raise SystemExit(main())