Bark: Transformer-based Text-to-Audio Generation Model by Suno-ai

Matthew David / April 25, 2023 2:41pm Github

Bark is a transformer-based text-to-audio model created by Suno. It can generate highly realistic, multilingual speech as well as other types of audio, including music and simple sound effects. The model also has the capability to produce nonverbal communications like laughing, sighing, and crying. Bark supports various languages out-of-the-box, automatically determining the language from input text. It also has voice presets and voice/audio cloning, the capability to fully clone voices including tone, pitch, emotion, and prosody. Bark uses GPT-style models to generate audio from scratch and can generalize to arbitrary instructions beyond speech that occur in the training data. It is available for installation via pip and can be run on both CPU and GPU. Bark is licensed under a non-commercial license, CC-BY 4.0 NC, and EnCodec, which functions as an audio codec, is licensed under a non-commercial license.

Deep-Live-Cam: Real-Time Face Swaps

Transform faces in videos effortlessly with Deep-Live-Cam. Enjoy real-time face swaps and create engaging content with just a single image. Dive into the...

GPT Engineer: Generate Codebase with AI Prompting

GPT Engineer is a Github repository that allows users to generate an entire codebase by providing a prompt which the AI then clarifies and builds upon. I...

Open Source Text-to-Video Synthesis Colab: Transforming Text into Video with AI

The Text-to-Video Synthesis Colab is a Github repository that includes various models for generating videos from text. Some of the models included are Po...

Roop: One-Click Deepfake: A New AI Program for Easy Face Swapping!

Roop is a software that allows users to replace faces in videos with one image of the desired face. There are two types of installations: basic, which is...

Qdrant - Vector Search Engine for the next generation of AI applications

Qdrant is a vector similarity search engine and vector database written in Rust, designed for extended filtering support, making it useful for various ne...

DeepFloyd IF: A Novel State-of-the-Art Open-Source Text-to-Image Model by StabilityAI

DeepFloyd IF is a modular, state-of-the-art open-source text-to-image model composed of a frozen text encoder and three cascaded pixel diffusion modules....

Bark: Transformer-based Text-to-Audio Generation Model by Suno-ai

Bark is a transformer-based text-to-audio model created by Suno. It can generate highly realistic, multilingual speech as well as other types of audio, i...

h2oGPT - The world's best open source GPT

h2oGPT is an open-source repository that provides code, data, and models for large language models or GPT. It includes code for preparing instruction dat...

StableLM: Ongoing Development of Stability AI Language Models

This Github repository is dedicated to the ongoing development of Stability AI's StableLM series of language models, including the recently released Stab...

BabyAGI-ASI: A Python Script for Autonomous and Self-Improving Agents

BabyAGI-ASI is a Python script example of a task-driven autonomous agent powered by OpenAI API designed to provide an assistant with tools to complete an...

Bringing Language Models and Chat to Web Browsers with ML Compilation on TVM Unity

The mlc-ai/web-llm Github repository hosts a project that aims to bring language models and chat to web browsers with no server support. The project util...

Microsoft releases DeepSpeed-Chat for easy, fast, and affordable training of ChatGPT-like models

DeepSpeed-Chat is an end-to-end RLHF pipeline for training powerful ChatGPT-like models that can summarize, code, and translate with top results. Existin...

JARVIS: A Collaborative System for Solving Complicated AI Tasks

JARVIS is a collaborative system that uses an LLM as the controller and numerous expert models as collaborative executors (from HuggingFace Hub) to solve...

VideoCrafter: A Toolkit for Text-to-Video Generation and Editing

VideoCrafter is an open-source video generation and editing toolbox for creating video content. It has three types of models: Base T2V for generic text-t...

Auto-GPT: An Autonomous GPT-4 Experiment

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. It is an attempt to make GPT-4 fully autonom...

Twitter's Recommendation Algorithm: Source Code for Building Feeds of Content

The Twitter Recommendation Algorithm is an open-source set of services and jobs responsible for constructing and serving the Home Timeline. It includes d...

Bark: Transformer-based Text-to-Audio Generation Model by Suno-ai

More in Github

Deep-Live-Cam: Real-Time Face Swaps

GPT Engineer: Generate Codebase with AI Prompting

Open Source Text-to-Video Synthesis Colab: Transforming Text into Video with AI

Roop: One-Click Deepfake: A New AI Program for Easy Face Swapping!

Qdrant - Vector Search Engine for the next generation of AI applications

Read More in AiShorts

DeepFloyd IF: A Novel State-of-the-Art Open-Source Text-to-Image Model by StabilityAI

Bark: Transformer-based Text-to-Audio Generation Model by Suno-ai

h2oGPT - The world's best open source GPT

StableLM: Ongoing Development of Stability AI Language Models

BabyAGI-ASI: A Python Script for Autonomous and Self-Improving Agents

Bringing Language Models and Chat to Web Browsers with ML Compilation on TVM Unity

Microsoft releases DeepSpeed-Chat for easy, fast, and affordable training of ChatGPT-like models

JARVIS: A Collaborative System for Solving Complicated AI Tasks

VideoCrafter: A Toolkit for Text-to-Video Generation and Editing

Auto-GPT: An Autonomous GPT-4 Experiment

Twitter's Recommendation Algorithm: Source Code for Building Feeds of Content