ARTIFICIAL INTELLIGENCE IN ARCHITECTURE AND BUILT ENVIRONMENT DEVELOPMENT 2024: A CRITICAL REVIEW AND OUTLOOK, 5th part: Spatiality and robotics to enhance AI in architecture

Also, VR introduces processes of AI-driven 3D model creation: HTC Viverse offers 3D model generators for the VR/XR community. AI enables text to be turned into VR 3D models, 2D images transformed into dynamic 3D models, and even extract intricate models from videos [162].

Naturally, AI merges with robotics. Neural networks taking video in and delivering trajectories out have taught Figure-1 to make coffee after watching humans at the activity [163]; other repetitive, though variable manual tasks can follow to deploy end-to-end AI solutions. Recently, Google DeepMind and Stanford researchers introduced Mobile Aloha – an open-source robotic system capable of completing complex tasks like cooking, cleaning, and more [164]. Noting deserves that humans remotely controlled the cooking skills.

Advancements include AutoRT for data collection, SARA-RT for faster transformers, and RT-Trajectory for better motion generalization [165]. The solutions deliver critical improvements in both areas essential for progress in AI-driven robotics: improving the robots´ ability to generalize their behavior to novel situations and boosting their decision-making speed. The systems also can understand practical human goals and enable robots to gather training data in new environments.

Nvidia is entering the field of humanoid robotics with the ambitious GR00T project. A new model of artificial intelligence is intended to give robots a human level of understanding and dexterity. Machines powered by the GR00T will be able to understand natural language and mimic human movements just by observing. As a result, they quickly learn the coordination, skills, and other abilities needed to effectively navigate, adapt, and interact in the real world. In the training workflow, Nvidia Isaac Lab – a lightweight reference application optimized for robot learning – and Nvidia OSMO – a cloud-native workflow orchestration platform that scales workloads across distributed environments and coordinates training and inference workflows across different Nvidia systems – are involved. Further, Nvidia has established a partnership for this project with leading robotics companies such as Figure AI, Boston Dynamics, and Apptronik.

A general-purpose foundation model designed specifically for humanoid robots, GR00T is trained using imitation and transfer learning, which allows humanoid embodiments to learn from a small number of human demonstrations. The model leverages reinforcement learning to enhance its understanding and decision-making abilities. Taking multimodal instructions (including natural language) and past interactions as input, GR00T shall generate robot movements by analyzing video data, enhancing further its adaptability and responsiveness [166,167,168].

Similarly, specific SMLs will likely come into play when developing AI to deploy in architectural designing and planning development of the built environment.

Keeping in mind AI deployment in architecture and the built environment development, the successes in robotics deserve particular attention. Imitation-based learning, self-learn, or transfer learning are the common denominators: if Figure-1 could have learned to prepare coffee by mimicking a human, AI must be capable of learning to develop architectural designs by mimicking human architects at work. Not by a video like Figure-1 but by following the progress of the solution in a CAD or VR environment. This learning will probably be much more complex and demanding, but … step-by-step, it will proceed. After all, it should be a question of a kind of scaling up.

Milestones

In terms of image recognition, it was AlexNet that set a benchmark in 2012 [169]. Designed by Alex Krizhnevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, a composition of eight layers, the first five convolutional layers, some of them followed by max-pooling layers, and the last three fully connected layers, AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge on September 30, 2012. The network achieved … an error of 15.3%, more than 10.8 percentage points lower than that of the runner up [59]. The paper introducing AlexNet is considered one of the most influential in computer vision, having spurred many more papers published employing CNNs and GPUs to accelerate deep learning. As of early 2023, the AlexNet paper has been cited over 120,000 times according to Google Scholar [170].

When IBM´s Arthur Samuel developed a machine learning system for playing checkers in 1959 [171], he used thirty-eight considerations determining the strength of a position – like the number of pieces on each side, the spatial distribution of stones, mobility and space, safety and risks, and on. By 1990, the IBM team working on the chess supercomputer Deep Blue used eight thousand such considerations. This chess evaluation function … probably is more complicated than anything ever described in the computer chess literature, put the team lead Feng-hsiung Hsu – and it deserves noting in this paper´s framework that perhaps similarly complicated is the structure of considerations on – let´s say – a residential building spatial layout development … In Deep Blue nonetheless, those thousands of considerations were brought into balance neither by trial and error (as would be typical for reinforcement learning) nor by human labeling of diverse alternatives (as in supervised learning) but by imitation of human moves employing one of the that time novel machine-learning technologies [172].

Fifteen years later, DeepMind´s AlphaGo system finally implemented Arthur Samuel´s vision of a system that could concoct its own positional considerations from scratch. Instead of being given a big pile of thousands of handcrafted features to consider, it used a deep neural network to automatically identify patterns and relationships that make particular moves attractive, the same way AlexNet had identified the visual textures and shapes [173].

The second and even more important lesson learned shall be a focus on the process instead of the output/result, as implemented already in Deep Blue. In October 2017, Google DeepMind brought the proces-focus instead of the output-result paradigm to a (so far) ultimate level by going through with the playing-against-itself strategy in AlpahGo Zero [174].

AlphaGo combines advanced search tree with deep neural networks. The “policy [neural] network”, selects the next move to play, and the “value network”, predicts the winner of the game: a reinforcement-learning paradigm. Initially, the developers introduced AlphaGo to numerous amateur games to help it develop an understanding of the play. Then it played against different versions of itself thousands of times, each time learning from its mistakes. Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-making. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player of all time [175]. AlphaGo Zero is a next-level version of the Go software. AlphaGo’s team article published in the journal Nature on October 19, 2017 introduced a novel learning strategy: by playing games against itself, without using data from human games, AlphaGo Zero exceeded all the old versions of AlphaGo in 40 days [176].

Combining biological inspiration, cognitive capabilities, adaptability, and open standards that go beyond traditional approaches, the distributed intelligence strategy proposed and in development by Verses may mark the next milestone, perhaps matching the new theory of mind and brain proposed by Jeff Hawkins and the Numenta team; both innovtions presented in sections (3) and (4) of the paper. If these visions prove feasible, AGI mght be at hand – and perhaps, an AI breakthrough in architecture and the built environment development along with it. Sam Altman claims to see hints of AGI in the embryos of the GPT-5 model. As early as 1999, futurist Ray Kurzweil predicted the advent of AGI by 2029. Metaculus, a platform that tries to compile various scientific predictions of future developments, said in 2022 that a very weak AGI would be available in 2042, now the consensus is that it would after all, it could have been created by 2030. However, Niall Fergusson states that the brain’s „computational capacity“ can handle 100 trillion parameters. The current most advanced GPT-4 model works with 1 trillion parameters. Handling a hundredfold increase can be extremely difficult [177].

The milestones displayed represent a history, in which imitation-based learning and self-learn gradually promoted. The history, on the one hand, culminates today in projects like Figure-1, Sara, GR00T, or Phoenix (introduced in section (5) of this paper) and, on the other hand, can lay a path to a truly efficient and productive deployment of AI in architecture. The second would be a game-changer that would demand the abandoning of a substantial proportion of the recent paradigms relying on statistical approach and input-output pairings, and GANs probably, too.

The Black Box problem, security issue, and a threat to humanity

With all these nice results, it is not clear what these models are learning, as Mathew Zeiler puts an issue [178]. Inevitably, due to the learning paradigm, AI systems are vulnerable to various types of attacks and data „poisoning“. In addition, by hostile input or action, an attacker can gain unauthorized access or control over the system.

Another significant challenge in the development and use of AI systems is the black box problem. Retrieving the training stock that cannot be but a pile of digital data, general pre-trained transformers – ChatGPT typically – compress the content. This compression is lossy as in the case of jpeg: we can imagine ChatGPT as a blurred jpeg of all text information on the web as Ted Chiang [179] puts it. The algorithm preserves a part of the information, just as a jpeg preserves much of the information of a higher-resolution image, but if looking for the exact bit sequence, it is not there. Only an approximation is always the result. However, since this approximation renders in the form of grammatical text, which ChatGPT is excellent at producing, it is usually acceptable. It is still a blurry jpeg, but the blurring occurs so that the image as a whole does not look less sharp. This comparison to lossy compression is not just a way to understand LLM’s ability to repackage information found on the web using other words. It is also a way to understand the „hallucinations“, surprising, or nonsensical answers to factual questions that LLMs are all a bit prone to. These are unsurprising results of compression; if a compression algorithm is designed to reconstruct text after 90 % of the original has been discarded, a significant portion of what it generates can only be fabricated from scratch.

Obviously, such a fabrication is impossible for humans to interpret or understand; a lack of transparency and accountability in AI decision-making is a result. Such a threat cannot be overestimated as it is inherently embedded in the nature of the learning process. The algorithm does not work – because by definition it cannot work – with the categories „true“ or „false“, but deduces the degree of conformity or deviation according to patterns arrived at by own judgment, either without human supervision or under human supervision or direction, but always covertly in detail. Confusion of cause and effect in access to the data, wether training or task ones, diverting attention to the background of graphic inputs (bokeh) instead of their core, or similar is usually shown as the cause.

The bokeh salience feature of AI provides a comprehensive clarification of „famous“ Coop Himmelb(l)au “machine hallucinations” [10,11,12]. Not the creativity of AI appears, but a misleading perception of visual information hidden in the algorithm´s black box; not creativity, but an error and accident. Computer hallucinations by unintended bokeh salience are examples of how technology can be misused either to manipulate and misinterpret visual information, to fake art, or to distort scientific research. The AI development community deserves credit for looking for and already delivering the first applications that solve this problem [180,181]. As a response, techniques also evolve that can be used to diagnose problems with the network’s training, identify biases in its decision-making, or optimize its performance for a specific task [68,69].

Starting with „fake news“ up to cyber-attacks and war escalations, the issue is much more than „hallucinations“. It has been nicknamed „Oppenheimer moment“ labeling a pivotal event in which a breakthrough technology or scientific discovery reaches a level of progress that has far-reaching consequences for society, ethics, and humanity. While nuclear technology was used in combat and resulted in hundreds of thousands of lives affected, it dramatically changed the world in general having delivered the energy our lives would have hardly be the same without. Just as nuclear technology dramatically changed the world in the last century, the development of AI is prone to revolutionize everything from industries to culture, politics, and humanity in this century and the centuries to come. The nuclear era has raised profound questions about the responsible use of scientific discoveries – no reason to exclude AI. Encountering responsibility, AI raises an unprecedented question: will man still be the most intelligent creature on the planet? Or, later this year or the next, GPT-5 will sideline humanity [182]. Alexander Karp, head of Palantir (founded in 2004 by him and Peter Thiel) that provides data mining services to government agencies like the Department of Defense, the FBI, and Immigration and Customs Enforcement, recently claimed that we are currently experiencing a kind of „Oppenheimer moment“ concerning the development of AI for military use and that there is no way for free democratic states not to continue developing AI, even if it is a risk to humanity, because other (non-democratic) countries will certainly not stop their AI development [183].

In addition, an opinion prevails today that technological development (of AI) is ahead of international law. Striving to react in practice, at the end of March 2023, Italy blocked ChatGPT [184] to secure the privacy of people and tycoons of global business claim pausing „giant AI experiments“ – read the development of AI for six months [185] to prevent an unmanaged reaching of the singularity phase of AI development, when a spontaneous technological growth breaks out, and not only the society begins to be irreversibly changed by the effects of technology but humanity loses all control over further development of AI.

The subject of concern is not only security immediately, but also the power of the big players in the AI field. Elen Musk (who fails to keep up with the cutting edge in this area) asserts (addressing OpenAI) that tech giants are inciting existential fears to evade scrutiny [186].

As another example, an issue can be AI enabling cyber criminals to generate lifelike images and make convincing videos that impersonate taxpayers to steal their refunds. Tax identity fraud „is a great crime, because so many tax refund dollars are transacted“ and it’s harder to spot suspicious behavior with a once-per-year transaction, Ari Jacoby, founder and CEO of cybersecurity firm Deduce. Tax professionals may also be caught off-guard by cyber criminals trying to get them to hand over sensitive client data by posing as real taxpayers. AI is particularly difficult for tax professionals to deal with „because it is self-learning, trying techniques and failing until it succeeds“ [187].

The issue does need to be a straight risk of war. OpenAI’s Sora video generator (discussed in section (5) of the paper), approaching being available to the public this year as confirmed by the company’s chief technology officer Mira Murati, illustrates the everyday nature of the fears and risks „at hand“. Concerns arise about possible misuse, for example for the creation of deepfake pornography or compromising footage of public figures. Polls show that most Americans support introducing regulatory measures and safeguards to prevent abuse of AI video-generating tools. More than two-thirds of respondents believe these tools‘ developers should be held legally responsible for any illegal activities. According to cyber experts, there is an urgent need to adopt rules for user authentication, content flagging, risk assessment, and restrictions on the export of AI-generated videos [188].

Since Italy blocked ChatGPT, the EU – slowly but persistently and consistently as usual – has been finding and codifying a balanced approach to AI, which has been treating possible risks in a structured way but, at the same time, shading attention to cases where the technology can support the economic development, a comprehensive sustainable development, and well-being of the population, has been establishing support for the development of AI [189,190,191].

Global powers (also) in AI do not lag behind Europe. Significant steps to address the safety and security implications of AI have been undertaken including the US/China agreement to prevent the development and deployment of AI-powered weaponry that could pose risks to global security [192]. International agreement on AI safety gains signatures from 18 countries: the US, UK, and other major powers (excluding China) have unveiled a 20-page document with general recommendations for companies developing and deploying AI systems. These recommendations include monitoring for abuse, protecting data from tampering, and vetting software suppliers [193]. During a British summit, China agreed to work with the United States, the European Union, and other countries to collectively manage the risk from AI. This collaborative effort recognizes the need for international cooperation in addressing the challenges posed by AI technology [194]. Naturally, on national levels, the legislation starves to catch up with the hectic development, too [195].

References

Introduction figure: Labbe, J.: Diverse AI-tools deployment at a multi-level design development. Luka Developent Scenario, Prague. MS architekti, Prague. 2024. author´s archive.

Michal Sourek

Exkluzivní partner

Hlavní partneři

Partneři