# KYD DOCX 2 MD

## Overview

### Cmd Utility: `kyd_docx2md`

`kyd_docx2md` Converts MS Word DOCX files into Markdown. The primary goal of this toolset is to
generate MD most compatible and useable for LLM consumption.

```txt
usage: kyd_docx2md [-h] [-if IMAGE_FOLDER] [-o OUTPUT] [-ao] [--no-images] [-rw [REMOVE_WRAPPING_TABLES ...]]
                   [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-lf LOG_OUTPUT] [-cs CUSTOM_STYLE]
                   inputs [inputs ...]

Convert DOCX files into Markdown format.

positional arguments:
  inputs                Input DOCX files or glob patterns (supports wildcards, e.g. 'docs/*.docx')

options:
  -h, --help            show this help message and exit
  -if IMAGE_FOLDER, --image-folder IMAGE_FOLDER
                        Location where the images are stored
  -o OUTPUT, --output OUTPUT
                        Generated output Markdown file (default: same basename as input with .md extension)
  -ao, --ascii_only     Convert output to ASCII only
  --no-images           Do not export images from the DOCX file.
  -rw [REMOVE_WRAPPING_TABLES ...], --remove-wrapping-tables [REMOVE_WRAPPING_TABLES ...]
                        Remove wrapping tables of specific sizes (e.g. 1x1, 1x2)
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the logging level. Default is INFO [DEBUG, INFO, WARNING, ERROR, CRITICAL]
  -lf LOG_OUTPUT, --log-output LOG_OUTPUT
                        Set the optional output logging file name.
  -cs CUSTOM_STYLE, --custom-style CUSTOM_STYLE
                        Set the optional style file to be loaded to support custom styles.
```

> [!WARNING] Note: As  `rw` reads 0 or more tokens it must either be put at the end of the command
> Example:
>> `kyd_docx2md -ao -rw 1x1 1x2 -- *.docx` - Note the token `--` to indicate the end  
> Or  
>> `kyd_docx2md -ao *.docx -rw 1x1 1x2`

Where `CUSTOM_STYLE` file contains mapping of specific style types to bullet styles. This caters
for the scenarios that styles can be configured to "appear" as bullets but are not part of any
numbering scheme. Normally related to a document author who doesn't fully understand how to correctly
configure bullet schemes. In rare cases the `CUSTOM_STYLE` configuration enables proper MD rendering.

```json
{
    "bullets": {
        "Bullet1": 0,
        "Bullet2": 1
    }
}
```
