Python
I have thousands of audio files on my local computer and I run an API with FastAPI that extracts features from audio files and sends these features back to me (json).
The procedure is as follows: First I send the audio file and then I get a token back. I can then use the token to check again and again whether processing has already been completed. If so, I get the features back.
I managed to post one audio file with:
url = 'https://something/submit/'path = 'dir/interview-10001.wav'with open(path, 'rb') as fobj: res = requests.post(url, files={'file': fobj})print(res.text){"message":"Submitted successfully","token":"abcd"}
... and I managed to get the features for one file with:
url = 'https://something/features/abcd'res_feat = requests.get(url)res_feat.json(){'filesize': 253812,'language': 'de','language_prob': 0.9755016565322876,'n_words': 15, ... }
How can I now make hundreds or thousands of requests in parallel or asynchronously and collect the results?
I tried e.g. this code (https://dev.to/ndrbrt/python-upload-multiple-files-concurrently-with-aiohttp-and-show-progress-bars-with-tqdm-32l7) in a jupyter notebook:
class FileManager(): def __init__(self, file_name: str): self.name = file_name self.size = os.path.getsize(self.name) self.pbar = None def __init_pbar(self): self.pbar = tqdm( total=self.size, desc=self.name, unit='B', unit_scale=True, unit_divisor=1024, leave=True) async def file_reader(self): self.__init_pbar() chunk_size = 64*1024 async with aiofiles.open(self.name, 'rb') as f: chunk = await f.read(chunk_size) while chunk: self.pbar.update(chunk_size) yield chunk chunk = await f.read(chunk_size) self.pbar.close()async def upload(file: FileManager, url: str, session: aiohttp.ClientSession): try: data = {'file': open(file.name, 'rb')} async with session.post(url, data={'data': data}) as res: # NB: if you also need the response content, you have to await it return res except Exception as e: # handle error(s) according to your needs print(e)async def main(files): url = 'https://something/submit/' files = [FileManager(file) for file in files] async with aiohttp.ClientSession() as session: res = await asyncio.gather(*[upload(file, url, session) for file in files]) print(f'All files have been uploaded ({len(res)})') return res
I started it with:
path_1 = '/media/SPEAKER_01/interview-10001.wav'path_2 = '/media/SPEAKER_00/interview-10001.wav'files = [path_1, path_2]res = await main(files)res
But I get back:
[422 Unprocessable Entity]><CIMultiDictProxy('Content-Length': '89', 'Content-Type': 'application/json', 'Date': 'Fri, 19 Jul 2024 09:54:29 GMT', 'Server': 'envoy', 'x-envoy-upstream-service-time': '23')>, ...