Our mailbox has 2 doors: Big door for parcels, and a smaller one for envelopes. I have set up some automations that in most cases lets me not only know when something was put into my mailbox, but also lets me know somewhat specifically what it might be that was put in the mailbox: Letter delivered by the mailman, parcel or just publicity.
In order to do so I have some tools that helps me:
- Movement sensor inside the mailbox.
- Contact sensor on the smaller door.
- Surveillance camera that can capture a photo of the area around the mailbox.
- LLM Vision Home Assistant Integration
- API keys to some free Vision Large Language Models.
- Frigate Video Surveillance.
- Various sensors for vehicle presence, door contacts, movement, etc.

This is how it works:
- When the envelope door is opened, save the timestamp.
- When movement inside the mailbox, save the timestamp and toggle on the new mail arrived sensor.
- Look at the time difference for the above events to determine if the movement was caused by a letter, or by something bigger.
- Save the timestamp for the following events: House door is opened (front door, terrasse door, garage door), person detected somewhere on the property by Frigate (typically e.g. someone walking on their way to the mailbox), movement detected in e.g. garage, Car arrived at home (detected by 433Mhz thermometer in the car sending status update).
- Use the above timestamps to see if the mailbox movement might have been caused by someone emptying the mailbox (by checking if timestamp difference is less than 30 seconds). If that’s the case, the mail arrived sensor is reset, and no photo is taken. There is also a manual reset function (triggered by button on dashboard or ZWave command panel next to the front door).
- When “New Mail Arrived” sensor is toggled on, the surveillance camera takes a photo of the mailbox area. Using LLM Vision, the photo is sent to an AI model. I have functioning configurations using either Ollama Cloud (running locally on an LXC on my Proxmox server), Gemini, Openrouter or Groq.
- The LLM Vision Integration has already been configured with the “memory” function: 6-7 photos of examples of different delivery types, with an explanatory text is sent to “train” the model.
- The model is prompted to distinguish if the mailbox contains a letter, a parcel or publicity. It also looks at whether there actually was a person that did trigger the movement. A sensor is set accordingly. The prompt also indicates (using the heuristic logic for the 2 doors) whether it’s most likely letter sized or Parcel sized. This is typically to distinguish someone not wearing a (mailman) uniform delivering a parcel, and someone just putting publicity in the mailbox.
- Home Assistant sends a Telegram notification. Dashboard sensor shows the content of the mailbox.




Flowchart
200%




