Metadata-Version: 2.4
Name: kyd_docx2md
Version: 0.1.0
Summary: Utility function to convert MS DOCX files into Markdown.
Project-URL: Homepage, https://github.com/KYD-Analytics/kyd_docx2md
Project-URL: Issue Tracker, https://github.com/KYD-Analytics/kyd_docx2md/issues
Author-email: KYD Analytics <sales@kyd.ai>
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: anyascii
Requires-Dist: python-docx
Requires-Dist: python-dotenv
Description-Content-Type: text/markdown

# KYD DOCX to Markdown (kyd_docx2md)

[![Version](https://img.shields.io/badge/Python%20Version-3.10-green)](https://pypi.org/project/kyd-docx2md) [![Coverage](./coverage.svg)](./README.md)

The KYD DOCX2MD provides a Python utility to convert the MS Word file format into Markdown format. There are two supported conversions:

* Plain Markdown
* LLM Enhanced Markdown

## Installation

```bash
pip install kyd_docx2md
```

For development purposes a Regular and Editable Local install can be run.

### Regular Local Install

```bash
pip install path/to/kyd_docx2md
```

### Editable Local Install

```bash
pip install -e path/to/kyd_docx2md
```

An example output would be:

```text
(std311) C:\work\KYD\fetch-ch-account\src\kyd_docx2md>pip install .
Processing c:\work\kyd\fetch-ch-account\src\kyd_docx2md
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: kyd_docx2md
  Building wheel for kyd_docx2md (pyproject.toml) ... done
  Created wheel for kyd_docx2md: filename=kyd_docx2md-0.1.0-py3-none-any.whl size=22230 sha256=d2176cefbfcf6fedc462ddfc5a4ebea9343cd139b4cba7dec5d0610ab903c66a
  Requirement already satisfied:
  ...
  ...
  ...
Successfully built kyd_docx2md
Installing collected packages: kyd_docx2md
  Attempting uninstall: kyd_docx2md
    Found existing installation: kyd_docx2md 0.1.0
    Uninstalling kyd_docx2md-0.1.0:
      Successfully uninstalled kyd_docx2md-0.1.0
Successfully installed kyd_docx2md-0.1.0
```

## Build Package

This will build the package in an isolated environment, generating a source-distribution and wheel in the directory dist/. See the documentation for full information.

Change directory into the folder where the where `pyproject.toml` is located and run:

```bash
python -m build
```

## Usage

The package provides a command-line tool `kyd_docx2md` to convert DOCX files to Markdown format.

### Basic Usage

Convert a single DOCX file to Markdown:

```bash
kyd_docx2md input.docx
```

This will create `input.md` in the same directory.

### Specify Output File

```bash
kyd_docx2md input.docx -o output.md
```

### Exclude Specific Colors from Markdown Encoding

You can exclude specific hex color codes from being encoded in the Markdown output. By default, white (#FFFFFF), black (#000000), and auto colors are already excluded.

Exclude additional colors using the `--exclude-colors` or `-ec` option:

```bash
kyd_docx2md input.docx --exclude-colors "#FF5733,#AABBCC"
```

Or using the short form:

```bash
kyd_docx2md input.docx -ec "#FF5733,#AABBCC,#112233"
```

**Note:** Color codes must be in the format `#RRGGBB` (case-insensitive). Invalid formats will result in an error.

### Additional Options

- `--image-folder` or `-if`: Specify where images are stored (default: `./images`)
- `--output` or `-o`: Specify output Markdown file path
- `--ascii_only` or `-ao`: Convert output to ASCII only
- `--no-images`: Do not export images from the DOCX file
- `--remove-wrapping-tables` or `-rw`: Remove wrapping tables of specific sizes (e.g., `1x1`, `1x2`)
- `--log-level` or `-l`: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- `--custom-style` or `-cs`: Load custom style file (JSON format)

### Examples

Convert multiple files with color exclusion:

```bash
kyd_docx2md docs/*.docx -ec "#CCCCCC,#EEEEEE" -o output_dir/
```

Convert with ASCII-only output and exclude red color:

```bash
kyd_docx2md input.docx -ao -ec "#FF0000" -o output.md
```

## License

`kyd_docx2md` is distributed under the terms of the [GPL 3](https://www.gnu.org/licenses/gpl-3.0.html) license.
