Neuchips Demos Recommendation Accelerator for LLM Inference

, ,
Taiwanese AI accelerator maker Neuchips has demo’d Llama2-7B inference at 240 tokens/second on a 4-chip PCIe card.

This is a companion discussion topic for the original entry at https://www.eetimes.com/neuchips-demos-recommendation-accelerator-for-llm-inference/