Add audio transcribing example and support

Add Grok Chat provider
Rename images parameter to media
Update demo homepage
This commit is contained in:
hlohaus 2025-03-21 03:17:45 +01:00
parent 10d32a4c5f
commit c97ba0c88e
36 changed files with 407 additions and 300 deletions

View file

@ -19,6 +19,7 @@ The G4F AsyncClient API is designed to be compatible with the OpenAI API, making
- [Text Completions](#text-completions)
- [Streaming Completions](#streaming-completions)
- [Using a Vision Model](#using-a-vision-model)
- **[Transcribing Audio with Chat Completions](#transcribing-audio-with-chat-completions)** *(New Section)*
- [Image Generation](#image-generation)
- [Advanced Usage](#advanced-usage)
- [Conversation Memory](#conversation-memory)
@ -203,6 +204,54 @@ async def main():
asyncio.run(main())
```
---
### Transcribing Audio with Chat Completions
Some providers in G4F support audio inputs in chat completions, allowing you to transcribe audio files by instructing the model accordingly. This example demonstrates how to use the `AsyncClient` to transcribe an audio file asynchronously:
```python
import asyncio
from g4f.client import AsyncClient
import g4f.Provider
import g4f.models
async def main():
client = AsyncClient(provider=g4f.Provider.PollinationsAI) # or g4f.Provider.Microsoft_Phi_4
with open("audio.wav", "rb") as audio_file:
response = await client.chat.completions.create(
model=g4f.models.default,
messages=[{"role": "user", "content": "Transcribe this audio"}],
media=[[audio_file, "audio.wav"]],
modalities=["text"],
)
print(response.choices[0].message.content)
if __name__ == "__main__":
asyncio.run(main())
```
#### Explanation
- **Client Initialization**: An `AsyncClient` instance is created with a provider that supports audio inputs, such as `PollinationsAI` or `Microsoft_Phi_4`.
- **File Handling**: The audio file (`audio.wav`) is opened in binary read mode (`"rb"`) using a context manager (`with` statement) to ensure proper file closure after use.
- **API Call**: The `chat.completions.create` method is called with:
- `model=g4f.models.default`: Uses the default model for the selected provider.
- `messages`: A list containing a user message instructing the model to transcribe the audio.
- `media`: A list of lists, where each inner list contains the file object and its name (`[[audio_file, "audio.wav"]]`).
- `modalities=["text"]`: Specifies that the output should be text (the transcription).
- **Response**: The transcription is extracted from `response.choices[0].message.content` and printed.
#### Notes
- **Provider Support**: Ensure the chosen provider (e.g., `PollinationsAI` or `Microsoft_Phi_4`) supports audio inputs in chat completions. Not all providers may offer this functionality.
- **File Path**: Replace `"audio.wav"` with the path to your own audio file. The file format (e.g., WAV) should be compatible with the provider.
- **Model Selection**: If `g4f.models.default` does not support audio transcription, you may need to specify a model that does (consult the provider's documentation for supported models).
This example complements the guide by showcasing how to handle audio inputs asynchronously, expanding on the multimodal capabilities of the G4F AsyncClient API.
---
### Image Generation
**The `response_format` parameter is optional and can have the following values:**
- **If not specified (default):** The image will be saved locally, and a local path will be returned (e.g., "/images/1733331238_cf9d6aa9-f606-4fea-ba4b-f06576cba309.jpg").