others

multimodal ocr implementations
multimodal ocr implementations spaces for image to text and video to text space for image and video understanding, optimized for document-level optical character recognition.
spacemay 2025
total visits: 1m+
visionscope r2
visionscope r2 implementation hf space for image to text and video to text space for image and video understanding, optimized for document-level optical character recognition.
spacemay 2025
total visits: 29.1k+
inkskope captions 2b 0526
the inkscope captions 2b 0526 model is a fine-tuned version of qwen2-vl-2b-instruct, optimized for image captioning, vision-language understanding, and english-language caption generation.
modelmay 2025
total downloads: 0.5k+
common voice gender detection
common voice gender detection is a fine-tuned version of facebook/wav2vec2-base-960h for binary audio classification, specifically trained to detect speaker gender as female or male.
modeljune 2025
total downloads: 1.1k+
llama 3b mono cooper
llama 3b mono cooper is a llama based speech llm designed for high-quality, empathetic text-to-speech generation. this model has been fine-tuned to deliver human-like speech synthesis, achieving exceptional clarity.
modelfebruary 2025
total downloads: 0.1k+
docscope r1
image to text and video to text space for image and video understanding, optimized for document-level optical character recognition and long-context vision-language understanding.
sapcemay 2025
total visits: 65.2k+
llama 3b mono ceylia
llama 3b mono ceylia is a llama based speech llm designed for high-quality, empathetic text-to-speech generation. this model has been fine-tuned to deliver human-like speech synthesis.
modelfebruary 2025
total downloads: 0.31k+
core ocr
core ocr space for experimenting with the coreocr-7b-050325, tailored for tasks involving improved document-level optical character recognition (ocr). it runs on nvidia h200.
spacemarch 2025
total visits: 84.1k+
llama 3b mono jim
llama 3b mono jim is a llama based speech llm designed for high-quality, empathetic text-to-speech generation. this model has been fine-tuned to deliver human-like speech synthesis.
modelfebruary 2025
total downloads: 0.26k+
llama 3b mono luna
llama 3b mono luna is a llama based speech llm designed for high-quality, empathetic text-to-speech generation. this model has been fine-tuned to deliver human-like speech synthesis.
modelfebruary 2025
total downloads: 0.11k+