# KYD Docx-2-MD - AI Agent Instructions

## Architecture Overview

This is a modular Python framework for converting DOCX files into Markdown. Key architectural patterns:

### Component Pattern

- **Parts Classes**: Use `XxxParts` classes for modular functionality (e.g., `TableParts`, `ParaPart`, `RunPart`)
- **Configuration Injection**: Pass `DocxConfig` and runtime state to components
- **Single Responsibility**: Each part handles one document element type

Example from `docx_parts_table.py`:

```python
class TableParts:
    def __init__(self, converter_instance: "Docx2Md", config: DocxConfig) -> None:
        self.config = config
        self.converter_instance = converter_instance
```

### Configuration Pattern

- **DocxConfig**: Central configuration class with nested settings
- **DocxRuntime**: Tracks processing state (hyperlinks, images, missing_types, etc.)
- **Runtime State**: Use `config.runtime` for mutable state during processing

Example:

```python
docx_config = DocxConfig()
docx_config.runtime.hyperlinks[r] = {"val": target_ref, "type": "hyperlink"}
```

## Development Workflow

### Build & Install

```bash
# Install in development mode
pip install -e ./package/kyd_docx2md

# Install with dev dependencies
pip install -e ./package/kyd_docx2md[dev]
```

### Testing Commands

```bash
# Run all tests
pytest

# Run with coverage
coverage run -m pytest
coverage html  # Generates htmlcov/index.html

# Run specific package tests
cd package/kyd_docx2md && pytest
```

### Code Quality

```bash
# Lint and format
ruff check .
ruff format .

# Fix auto-fixable issues
ruff check . --fix
```

## Code Patterns

### Import Structure

- **Relative imports** within packages: `from .docx_config import DocxConfig`
- **TYPE_CHECKING** for circular imports:

```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from .kyd_docx2md import Docx2Md
```

### Logging Pattern

```python
logger = logging.getLogger(__name__)
logger.debug(f"Processing element: {element}")
```

### Error Handling

- **Missing Types Tracking**: Log unknown elements to `config.runtime.missing_types`
- **Validation**: Check for positive values, valid formats
- **Graceful Degradation**: Continue processing on individual failures

### CLI Argument Patterns

```python
parser.add_argument(
    "inputs",
    nargs="+",  # One or more positional args
    help="Input files or glob patterns",
)
parser.add_argument(
    "-rw",
    "--remove-wrapping-tables",
    nargs="*",  # Zero or more args
    default=None,
    help="Table size filters",
)
```

## Key Files & Directories

### Core Structure

- `package/kyd_docx2md/src/kyd_docx2md/` - Main package
- `package/kyd_docx2md/src/kyd_docx2md/docx_parts_*.py` - Component modules
- `package/kyd_docx2md/tests/` - Test files
- `package/kyd_docx2md/pyproject.toml` - Package configuration

### Configuration Files

- `pyproject.toml` - Build system, dependencies, tool configs
- `requirements.txt` - Legacy dependency management
- `.env` - API keys and secrets (copy from `.env-TEMPLATE.txt`)

### Important Patterns

- **Table Processing**: Check `has_merged_cells()` before processing
- **Image Handling**: Extract to `config.output_image_dir`, track in `runtime.images`
- **Text Encoding**: Use `anyascii()` for Unicode normalization
- **HTML Cleaning**: Apply `clean_html_tags()` after processing

## Integration Points

### External APIs

- **Document Processing**: Uses `python-docx` for .docx parsing

### Cross-Component Communication

- **Runtime State**: Share state via `config.runtime`
- **Converter Instance**: Pass main converter to parts classes
- **Callback Pattern**: Parts call back to main converter methods

## Common Gotchas

- **Namespace Handling**: Use `docx_namespaces` for XML parsing
- **Path Handling**: Always use `Path` objects, not strings
- **Encoding**: Assume UTF-8 for all file operations
- **Circular Imports**: Use TYPE_CHECKING and string literals for forward references

## Testing Patterns

- **Fixture Usage**: Create reusable test fixtures for common objects
- **Mock External APIs**: Mock HTTP calls and file system operations
- **Coverage Goals**: Maintain >85% coverage
- **Test Structure**: Mirror source structure in `tests/` directory
