AI in Smart Buildings #1— Problem Definition

Oct 13, 2022 | Jagannath Rajagopal | 8 min read

Smart buildings offer considerable savings in costs, especially energy efficiency and ongoing maintenance. Business cases for smart buildings project savings of $2 — $20 per sq. ft. per year in utility savings, $300–3000 profit per year per employee etc. This translates to savings in the range of $1 — $3 million per year per building. This is for residential and commercial buildings. Smart buildings for factories, warehouses, data centres etc can offer additional savings in lieu of the function they serve.

The hope of smart buildings is that they function largely autonomously without the need for manual intervention. They reduce the need for manual processes making them more efficient.

One interesting problem is predicting intentions with video surveillance, and modelling action responses. On the one hand, buildings may choose to turn on access to floors, and lighting/HVAC selectively based on the presence of individuals. If I am about to use the bathroom, the lights can be turned on then. If I am about to leave work for the day, the building may notify the cleaning staff.

Alternately, are people shopping or loitering? Is a person about to shoplift? Is there a risk of a robbery? In residential buildings, is there a break-in about to happen? Are people disturbing the environment? One callout is the racial/ethical bias that currently exists in the data — we obviously don’t want models that are biased. In such situations, would the building send an emergency action request to a staff member?

What are some design considerations? One thing that differentiates this one from the above two is it potentially involves action performed by an agent in real time. In that sense, the design process is more involved. Speaking of which, specific actions performed by agents would be guided by policies, guidelines and rules, which would be relatively simple. These could be captured by a simple knowledge base that models action-intention pairs using such knowledge. These guidelines would also take into account health & safety, and other regulations, like one that states emergency exits should never be locked.

Every building is slightly different from the other; every camera setting would be different than the other as well.

If you are looking to model intentions, it may make sense to build a model from many different camera settings, but use that as a starting point to customize to the specific settings in your building. It may be the case that the company that manages the building may manage many others like it. If this is the case, the customization approach may be cost-efficient instead of training models from scratch every time.

Just as interesting, could you model a crowd from individual intentions? For some problems like simulating energy consumption, reasonably accurate crowd models would be very useful. If so, it may be possible to scale models of individuals or interactions between individuals to get a crowd.

For video processing, one prep step is to manage clips with different resolutions and frame rates. Would you build a model that specializes just in this? If you have one model for intentions, and another one for sanitation where either uses video feeds, having a special prep model makes sense.

Speaking of video feeds, they are of very high volume and velocity. Since the building needs to perform actions, models will have the added requirement of recommending actions in near real-time.

Smart buildings are highly dependent on sensor systems — operational and environmental. Capturing all types of sensor data may give added confirmation to detected intentions in some cases. If someone enters a corridor, and opens the washroom door, the sensor attached to the door adds confirmation that someone may be interested in using the facility.

Another way this use case is different is that intention detection and action responses take place in the field. Models may be trained remotely and centrally but need to be deployed in the building for in-field inference. There could be a server room in the building that houses the needed machines. The other alternative is to send video feeds in real-time to the cloud, but may be cost and reaction-time prohibitive. In advanced use cases like Smart Environments, buildings may learn in real-time where they adapt to changing habits of the residents.

I’ve created a LOT of resources on this topic. Here’s my course on Design Thinking for Hero Methods. Here’s my YouTube channel with previews of course videos. Here’s my website; navigate through the courses to find free previews & pdfs.

- - -

The smart building is a paradigmatic case for a hybrid virtual and physical environment. At the end, most actions performed by a smart building happen via physical actuators either directly or through intermediate systems like HVAC.

There is a dual role played by users; they are both the input to the smart building as well as recipients of the actions and outcomes. On the other hand, smart buildings interact mostly with other automated systems in the building, acting like the coordinator. It’s level of interaction with users is at a very high level, while that with other systems is too detailed to be human-acceptable.

While some intentions are high stake, especially security related ones, others may be mild and benign. Based on this, it may make sense to have a human in the loop for key critical action responses. If it is done right, it will instill a high level of trust in the smart building. Further, building a story after the fact, interfacing with the systems of authorities for incident reports, having multiple cameras and sensors confirm the same intention etc will help bolster provenance. Finally, having a human in the loop may also be needed to monitor action sequences proposed by the building.

One issue with data-driven modelling is that any bias in the data will be learned by the models. Models can be biased if the data contains more examples of a particular group exhibiting certain intentions as compared to others. We do not want our models to be biased. Race is one example that comes to mind. While the focus of this work is not about reducing bias, one way to do so would be to create a scaling model that associates intentions to all races. Or maybe, we have a specialized prep step that paints us all in orange!

Needless to say, you could start just with modelling intentions but not action responses. This could be a first stage in implementing a fully smart building and would be a way of building up trust step by step. Any intentions perceived would then require manual actions.

True intentions are not visible; we can only extrapolate from behaviours and actions.

A smart building may have several models operating in parallel; like ones for video prep, sanitation, surveillance/intentions, HVAC operation, elevators, access etc. Given multiple models, they may need to collaborate with one another and other automated but non-smart systems.

Sensor-based systems have to deal with noise, while basic human nature makes for fuzzy intentions models. So there is underlying uncertainty in smart buildings and comes with the territory.

There is a significant time component in modelling intentions and other smart building systems — intentions reveal themselves over time or several video frames. If someone breaks into a building and shows up in a parking lot, there is a chance something may happen. These models need to remember relevant information.

Included: Just about everything, except Ownership, Change, and knowledge of domain.

- - -

Here’s one way to frame the problem structure. Our goal is to be illustrative in the process and showcase one way in which a complex problem can be decomposed. In reality, this will be driven by the specific scenario of the problem.

Say a smart building designer wants to model intentions in different floors for surveillance Though the problem body mentions HVAC, we focus just on security for the specific example. The idea is to simulate activity on entire floors in an effort to manage security risks. It is expected this may potentially result in savings of $2 million every year.


There are three different perspectives that can be modelled

  • Spatial — Whole building, location (floor, parking lot etc), sub-location (wing, room cafeteria etc), or by camera.
  • Security/Access — Could be as simple as common areas vs restricted. Depending on the building, there could be many levels of security. There may also be user input to regulate access to certain areas (like apartments).
  • Function — Surveillance vs access control vs scenario analysis. We choose this perspective as this allows the best richness from an illustration standpoint; each of the levels has a different set of inputs/features that it works with. Surveillance is about cameras while access control may be about access cards/biometrics.


There are two kinds: input data and model.

  • Cameras and other sensors complement each other to provide a full view of the target being surveilled. Each input has a partial view of the target and multiple inputs may need to be combined to get the full view. Alternately, for access control, smart cards, biometrics and facial recognition may be combined together to increase accuracy of identifying people and providing access to restricted areas. While entry through access points needs to be perfect, a certain degree of error may be tolerated in facial recognition on intruders (you need to know someone is intruding as opposed to who it is).
  • Incident analysis in or operation of a security engine requires multiple components interacting with each other — a representation of rules/regulations, models of crowd behaviour, and/or other components like those that manage safety protocols.


  • Survey common areas including corridors, restrooms, food courts, shopping, and parking.
  • Survey restricted areas such as offices, management areas, warehouses, etc.
  • Prevent shoplifting, burglary (cars in parking lots), pickpocketing, or other illegal activities.
  • Enforce security in restricted areas by detecting unauthorized persons in these areas.
  • Where video surveillance is restricted (e.g., toilets, locker rooms), manage with alternate indirect sensor surveillance.
  • Comply with existing regulations, be unbiased to race/gender, and follow safety and security laws and guidelines.

- - -

In the next article of this series, we’ll begin Solution Architecture. You’ll learn the big picture — the high level process — in defining a Smart Building solution.

At Kado, we treat Smart Buildings as a top-level problem, that is primarily Data Driven. There may be room for methods that don’t use data, but because of the heavy presence of sensor data, images, CCTV & biometrics, data takes centre-stage in this problem.
Don't hesitate!

Design Thinking for Hero Methods

Created with