# PhonemeExtractor

**Repository Path**: qq2524/PhonemeExtractor

## Basic Information

- **Project Name**: PhonemeExtractor
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-29
- **Last Updated**: 2025-10-29

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PhonemeExtractor
Extracts phoneme sequences from speech audio files


## Text2Phoneme experiments
All the expirements conducted for text2phoneme can be found in the `text2phoneme_experiments`
directory. The expreiments are in the form of jupyter notebooks that can be run with no additional
setup necessary other than pointing certain cells to your correct directories. Special thanks to 
[Ben Trevett](https://github.com/bentrevett/pytorch-seq2seq) and his awesome seq2seq modeling
tutorials for helping us get our experiments working.

`text2phoneme_experiments/DeepPhonemizer` contains the modified grapheme2phoneme experiment for
timit asr dataset which serves as our baseline.

## Phoneme Sequence Matching algorithm
In `matching_algs.py` you'll find the `phoneme_match` function which idenitifies mispredicted words
given the gt and reference phonemes + their corresponding word mappings. The test cases to ensure
the correctness of this algorithm are also present in this file. It can be run off the shelf as
follows:
```bash
$ python matching_algs.py
```

## Audio to Phoneme Model
Use the 'audio_to_phoneme.py' file to train a feature extractor, tokenizer, and model from scratch for converting wav audio files into phoneme sequences.
Make sure to have python version 3.8.7 installed and then run the following two shell commands:
```bash
$ pip install datasets transformers jiwer soundfile torch librosa dataclasses typing
$ python audio_to_phoneme.py
```