Uncategorized Category - MarkTechPost https://www.marktechpost.com/category/uncategorized/ An Artificial Intelligence News Platform Fri, 27 Oct 2023 01:12:29 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.2 https://www.marktechpost.com/wp-content/uploads/2022/04/cropped-Favicon-512-x-512-1-1-32x32.png Uncategorized Category - MarkTechPost https://www.marktechpost.com/category/uncategorized/ 32 32 127842392 Meta AI Introduces Habitat 3.0, Habitat Synthetic Scenes Dataset, and HomeRobot: 3 Major Advancements in the Development of Social Embodied AI Agents https://www.marktechpost.com/2023/10/26/meta-ai-introduces-habitat-3-0-habitat-synthetic-scenes-dataset-and-homerobot-3-major-advancements-in-the-development-of-social-embodied-ai-agents/ https://www.marktechpost.com/2023/10/26/meta-ai-introduces-habitat-3-0-habitat-synthetic-scenes-dataset-and-homerobot-3-major-advancements-in-the-development-of-social-embodied-ai-agents/#respond Fri, 27 Oct 2023 01:12:22 +0000 https://www.marktechpost.com/?p=45174 Facebook AI Research (FAIR) is dedicated to advancing the field of socially intelligent robotics. The primary objective is to develop robots capable of assisting with everyday tasks while adapting to the unique preferences of their human partners. The work involves delving deep into embedded systems to establish the foundation for the next generation of AR […]

The post Meta AI Introduces Habitat 3.0, Habitat Synthetic Scenes Dataset, and HomeRobot: 3 Major Advancements in the Development of Social Embodied AI Agents appeared first on MarkTechPost.

]]>

Facebook AI Research (FAIR) is dedicated to advancing the field of socially intelligent robotics. The primary objective is to develop robots capable of assisting with everyday tasks while adapting to the unique preferences of their human partners. The work involves delving deep into embedded systems to establish the foundation for the next generation of AR and VR experiences. The goal is to make robotics an integral part of our lives, reducing the burden of routine chores and improving the quality of life for individuals. FAIR’s multifaceted approach emphasizes the importance of merging AI, AR, VR, and robotics to create a future where technology seamlessly augments our daily experiences and empowers us in previously unimagined ways.

FAIR has made three significant advancements to address scalability and safety challenges in training and testing AI agents in physical environments:

  1. Habitat 3.0 is a high-quality simulator for robots and avatars, facilitating human-robot collaboration in a home-like setting.
  2. The Habitat Synthetic Scenes Dataset (HSSD-200) is a 3D dataset designed by artists to provide exceptional generalization when training navigation agents.
  3. The HomeRobot platform offers an affordable home robot assistant for open vocabulary tasks in simulated and physical-world environments, thereby accelerating the development of AI agents that can assist humans.

Habitat 3.0 is a simulator designed to facilitate robotics research by enabling quick and safe testing of algorithms in virtual environments before deploying them on physical robots. It allows for collaboration between humans and robots while performing daily tasks and includes realistic humanoid avatars to enable AI training in diverse home-like settings. Habitat 3.0 offers benchmark tasks that promote collaborative robot-human behaviors in real indoor scenarios, such as cleaning and navigation, thereby introducing new avenues to explore socially embodied AI.

HSSD-200 is a synthetic 3D scene dataset that provides a more realistic and compact option for training robots in simulated environments. It comprises 211 high-quality 3D sets replicating physical interiors and contains 18,656 models from 466 semantic categories. Although it has a smaller scale, ObjectGoal navigation agents trained on HSSD-200 perform comparably to those introduced on much larger datasets. In some cases, training on just 122 HSSD-200 scenes outperforms agents trained on 10,000 scenes from prior datasets, demonstrating its efficiency in generalization to physical-world scenarios.

In the field of robotics research, having a shared platform is crucial. HomeRobot seeks to address this need by defining motivating tasks, providing versatile software interfaces, and fostering community engagement. Open-vocabulary mobile manipulation serves as the motivating task, challenging robots to manipulate objects in diverse environments. The HomeRobot library supports navigation and manipulation for Hello Robot’s Stretch and Boston Dynamics’ Spot, both in simulated and physical-world settings, thus promoting replication of experiments. The platform emphasizes transferability, modularity, and baseline agents, with a benchmark showcasing a 20% success rate in physical-world tests.

The field of Embodied AI research is constantly evolving to cater to dynamic environments that involve human-robot interactions. Facebook AI’s vision for developing socially intelligent robots is not limited to static scenarios. Instead, their focus is on collaboration, communication, and predicting future states in dynamic settings. To achieve this, Researchers are using Habitat 3.0 and HSSD-200 as tools to train AI models in simulation. Their aim is to assist and adapt to human preferences while deploying these trained models in the physical world to assess their real-world performance and capabilities.


Check out the Reference Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meta AI Introduces Habitat 3.0, Habitat Synthetic Scenes Dataset, and HomeRobot: 3 Major Advancements in the Development of Social Embodied AI Agents appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/meta-ai-introduces-habitat-3-0-habitat-synthetic-scenes-dataset-and-homerobot-3-major-advancements-in-the-development-of-social-embodied-ai-agents/feed/ 0 45174
Meet FreeU: A Novel AI Technique To Enhance Generative Quality Without Additional Training Or Fine-tuning https://www.marktechpost.com/2023/10/26/meet-freeu-a-novel-ai-technique-to-enhance-generative-quality-without-additional-training-or-fine-tuning/ https://www.marktechpost.com/2023/10/26/meet-freeu-a-novel-ai-technique-to-enhance-generative-quality-without-additional-training-or-fine-tuning/#respond Thu, 26 Oct 2023 20:41:40 +0000 https://www.marktechpost.com/?p=45169 Probabilistic diffusion models, a cutting-edge category of generative models, have become a critical point in the research landscape, particularly for tasks related to computer vision. Distinct from other classes of generative models, such as Variational Autoencoder (VAE), Generative Adversarial Networks (GANs), and vector-quantized approaches, diffusion models introduce a novel generative paradigm. These models employ a […]

The post Meet FreeU: A Novel AI Technique To Enhance Generative Quality Without Additional Training Or Fine-tuning appeared first on MarkTechPost.

]]>

Probabilistic diffusion models, a cutting-edge category of generative models, have become a critical point in the research landscape, particularly for tasks related to computer vision. Distinct from other classes of generative models, such as Variational Autoencoder (VAE), Generative Adversarial Networks (GANs), and vector-quantized approaches, diffusion models introduce a novel generative paradigm. These models employ a fixed Markov chain to map the latent space, facilitating intricate mappings that capture latent structural complexities within a dataset. Recently, their impressive generative capabilities, ranging from the high level of detail to the diversity of the generated examples, have pushed groundbreaking advancements in various computer vision applications such as image synthesis, image editing, image-to-image translation, and text-to-video generation.

The diffusion models consist of two primary components: the diffusion process and the denoising process. During the diffusion process, Gaussian noise is progressively incorporated into the input data, gradually transforming it into nearly pure Gaussian noise. In contrast, the denoising process aims to recover the original input data from its noisy state using a sequence of learned inverse diffusion operations. Typically, a U-Net is employed to predict the noise removal iteratively at each denoising step. Existing research predominantly focuses on the use of pre-trained diffusion U-Nets for downstream applications, with limited exploration of the internal characteristics of the diffusion U-Net.

A joint study from the S-Lab and the Nanyang Technological University departs from the conventional application of diffusion models by investigating the effectiveness of the diffusion U-Net in the denoising process. To gain a deeper understanding of the denoising process, the researchers introduce a paradigm shift towards the Fourier domain to observe the generation process of diffusion models—a relatively unexplored research area. 

The figure above illustrates the progressive denoising process in the top row, showcasing the generated images at successive iterations. In contrast, the following two rows present the associated low-frequency and high-frequency spatial domain information after the inverse Fourier Transform, corresponding to each respective step. This figure reveals a gradual modulation of low-frequency components, indicating a subdued rate of change, whereas high-frequency components exhibit more pronounced dynamics throughout the denoising process. These findings can be intuitively explained: low-frequency components inherently represent an image’s global structure and characteristics, encompassing global layouts and smooth colors. Drastic alterations to these components are generally unsuitable in denoising processes as they can fundamentally reshape the image’s essence. On the other hand, high-frequency components capture rapid changes in the images, such as edges and textures, and are highly sensitive to noise. Denoising processes must remove noise while preserving these intricate details.

Considering these observations regarding low-frequency and high-frequency components during denoising, the investigation extends to determine the specific contributions of the U-Net architecture within the diffusion framework. At each stage of the U-Net decoder, skip features from the skip connections and backbone features are combined. The study reveals that the primary backbone of the U-Net plays a significant role in denoising, while the skip connections introduce high-frequency features into the decoder module, aiding in the recovery of fine-grained semantic information. However, this propagation of high-frequency features can inadvertently weaken the inherent denoising capabilities of the backbone during the inference phase, potentially leading to the generation of abnormal image details, as depicted in the first row of Figure 1.

In light of this discovery, the researchers propose a new approach referred to as “FreeU,” which can enhance the quality of generated samples without requiring additional computational overhead from training or fine-tuning. The overview of the framework is reported below.

During the inference phase, two specialized modulation factors are introduced to balance the contributions of features from the primary backbone and skip connections of the U-Net architecture. The first factor, known as “backbone feature factors,” is designed to amplify the feature maps of the primary backbone, thereby strengthening the denoising process. However, it is observed that the inclusion of backbone feature scaling factors, while yielding significant improvements, can occasionally result in undesired over-smoothing of textures. To address this concern, the second factor, “skip feature scaling factors,” is introduced to mitigate the problem of texture over-smoothing.

The FreeU framework demonstrates seamless adaptability when integrated with existing diffusion models, including applications like text-to-image generation and text-to-video generation. A comprehensive experimental evaluation of this approach is conducted using foundational models such as Stable Diffusion, DreamBooth, ReVersion, ModelScope, and Rerender for benchmark comparisons. When FreeU is applied during the inference phase, these models show a noticeable enhancement in the quality of the generated outputs. The visual representation in the illustration below provides evidence of FreeU’s effectiveness in significantly improving both intricate details and the overall visual fidelity of the generated images.

This was the summary of FreeU, a novel AI technique that enhances generative models’ output quality without additional training or fine-tuning. If you are interested and want to learn more about it, please feel free to refer to the links cited below. 


Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet FreeU: A Novel AI Technique To Enhance Generative Quality Without Additional Training Or Fine-tuning appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/meet-freeu-a-novel-ai-technique-to-enhance-generative-quality-without-additional-training-or-fine-tuning/feed/ 0 45169
Meet Gradio-lite: A JavaScript Library Elevating Interactive Machine Learning-Based Library (Gradio) to the Browser with Pyodide https://www.marktechpost.com/2023/10/26/meet-gradio-lite-a-javascript-library-elevating-interactive-machine-learning-based-library-gradio-to-the-browser-with-pyodide/ https://www.marktechpost.com/2023/10/26/meet-gradio-lite-a-javascript-library-elevating-interactive-machine-learning-based-library-gradio-to-the-browser-with-pyodide/#respond Thu, 26 Oct 2023 14:31:07 +0000 https://www.marktechpost.com/?p=45161 Gradio is an open-source Python library that simplifies the creation of user interfaces for machine learning models. It is open-source and allows developers and data scientists to build interactive web applications without extensive web development knowledge. The library is reliable and supports a wide range of machine-learning models, making it an ideal tool for enhancing […]

The post Meet Gradio-lite: A JavaScript Library Elevating Interactive Machine Learning-Based Library (Gradio) to the Browser with Pyodide appeared first on MarkTechPost.

]]>

Gradio is an open-source Python library that simplifies the creation of user interfaces for machine learning models. It is open-source and allows developers and data scientists to build interactive web applications without extensive web development knowledge. The library is reliable and supports a wide range of machine-learning models, making it an ideal tool for enhancing the user experience of your models.

Gradio provides a high-level interface for defining input and output components, making it easy to create customizable interfaces for tasks such as image classification, text generation, and more. It supports various input types, including text, images, audio, and video, making it a versatile tool for showcasing and deploying machine learning models with user-friendly interfaces. 

Gradio-Lite is a JavaScript library that enables the execution of Gradio applications directly within web browsers. It achieves this by utilizing Pyodide, a Python runtime for WebAssembly. Pyodide allows Python code to run in the browser environment, which makes it possible for developers to use regular Python code for their Gradio applications. It eliminates the need for server-side infrastructure and ensures seamless execution of Gradio applications in web browsers.

Gradio-Lite presents numerous advantages, such as serverless deployment, which eliminates the need for server infrastructure, simplifies deployment, and reduces costs. It also ensures low-latency interactions by running within the browser, providing faster responses and a smoother user experience. Moreover, Gradio-Lite enhances privacy and security since all processing occurs within the user’s browser. It ensures that user data remains on their device, thus instilling confidence in data handling.

Gradio-Lite has a significant limitation: it may take longer for Gradio apps to load in the browser initially due to the need to load the Pyodide runtime before rendering Python code. Additionally, Pyodide doesn’t support all Python packages. While popular packages like Gradio, NumPy, Scikit-learn, and Transformers-js can be used, apps with many dependencies should check if those dependencies are available in Pyodide or can be installed using micropip.

Gradio is a Python library for user-friendly machine learning interfaces, while Gradio-Lite is a JavaScript library that runs Gradio applications directly in web browsers. It offers serverless deployment for cost savings, low-latency interactions for a better user experience, and improved privacy and security. However, it may have longer initial load times and limited support for Python packages, potentially requiring adaptations for some applications.


Check out the Reference Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet Gradio-lite: A JavaScript Library Elevating Interactive Machine Learning-Based Library (Gradio) to the Browser with Pyodide appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/meet-gradio-lite-a-javascript-library-elevating-interactive-machine-learning-based-library-gradio-to-the-browser-with-pyodide/feed/ 0 45161
Researchers from the University of Washington and NVIDIA Propose Humanoid Agents: An Artificial Intelligence Platform for Human-like Simulations of Generative Agents https://www.marktechpost.com/2023/10/26/researchers-from-the-university-of-washington-and-nvidia-propose-humanoid-agents-an-artificial-intelligence-platform-for-human-like-simulations-of-generative-agents/ https://www.marktechpost.com/2023/10/26/researchers-from-the-university-of-washington-and-nvidia-propose-humanoid-agents-an-artificial-intelligence-platform-for-human-like-simulations-of-generative-agents/#respond Thu, 26 Oct 2023 12:00:00 +0000 https://www.marktechpost.com/?p=45154 Human-like generative agents are commonly used in chatbots and virtual assistants to provide natural and engaging user interactions. They can understand and respond to user queries, engage in conversations, and perform tasks like answering questions and making recommendations. These agents are often built using natural language processing (NLP) techniques and machine learning models, such as […]

The post Researchers from the University of Washington and NVIDIA Propose Humanoid Agents: An Artificial Intelligence Platform for Human-like Simulations of Generative Agents appeared first on MarkTechPost.

]]>

Human-like generative agents are commonly used in chatbots and virtual assistants to provide natural and engaging user interactions. They can understand and respond to user queries, engage in conversations, and perform tasks like answering questions and making recommendations. These agents are often built using natural language processing (NLP) techniques and machine learning models, such as GPT-3, to produce coherent and contextually relevant responses. They can create interactive stories, dialogues, and characters in video games or virtual worlds, enhancing the gaming experience.

Human-like generative agents can assist writers and creatives in brainstorming ideas, generating story plots, or even composing poetry or music. However, this process is different from how humans think fully. Humans often tend to constantly adapt changes to their plans according to the changes in the physical environment. Researchers at the University of Washington and the University of Hong Kong propose Humanoid agents that guide generative agents to behave more like humans by introducing different elements. 

Inspired by the psychology of humans, researchers have proposed a two-system mechanism with system 1 to handle the intuitive and effortless process of thinking and system 2 to handle the logical process of thinking. To influence the behavior of these agents, they introduced aspects like basic needs, emotions, and closeness of their social relationship with other agents. 

The designed agents need to interact with others, and upon failing, they will receive negative feedback comprising loneliness, sickness, and tiredness. 

The social brain hypothesis proposes that a large part of our cognitive ability has evolved to track the quality of social relationships. People often interact with others to adapt to changes. To mimic this behavior, they empower humanoid agents to adjust their conversations based on how close they are to one another. Their agents visualize them using a Unity WebGL game interface and present the statuses of stimulated agents over time using an interactive analytics dashboard. 

They created a sandbox HTML game environment using the Unity WebGL game engine to visualize humanoid agents in their world. Users can select from one of the three worlds to see the agent’s status and location at each step. Their game interface ingests JSON-structured files from the simulated worlds and transforms them into animations. They built Plotly Dash to visualize the status of various humanoid agents over time.  

Their systems currently support dialogues between only two agents, aiming to help multi-party conversations. As the agents are working with a simulation that does not perfectly reflect human behavior in the real world, the users must be informed that they are working with a simulation. Despite their capabilities, it’s essential to consider ethical and privacy concerns when using human-like generative agents, such as the potential for spreading misinformation, biases in the training data, and responsible usage and monitoring.


Check out the Paper and GithubAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from the University of Washington and NVIDIA Propose Humanoid Agents: An Artificial Intelligence Platform for Human-like Simulations of Generative Agents appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/researchers-from-the-university-of-washington-and-nvidia-propose-humanoid-agents-an-artificial-intelligence-platform-for-human-like-simulations-of-generative-agents/feed/ 0 45154
Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models https://www.marktechpost.com/2023/10/26/researchers-from-yale-and-google-deepmind-unlock-math-problem-solving-success-with-advanced-fine-tuning-techniques-on-large-language-models/ https://www.marktechpost.com/2023/10/26/researchers-from-yale-and-google-deepmind-unlock-math-problem-solving-success-with-advanced-fine-tuning-techniques-on-large-language-models/#respond Thu, 26 Oct 2023 11:30:00 +0000 https://www.marktechpost.com/?p=45138 Even the most advanced large language models (LLMs), such as GPT-4 and PaLM 2, find it difficult to solve mathematical issues since they call for imagination, mathematical reasoning, and computation. The chance of LLMs being able to discover a proper answer is considerably higher when they are permitted to tackle the problem many times. Therefore, […]

The post Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models appeared first on MarkTechPost.

]]>

Even the most advanced large language models (LLMs), such as GPT-4 and PaLM 2, find it difficult to solve mathematical issues since they call for imagination, mathematical reasoning, and computation. The chance of LLMs being able to discover a proper answer is considerably higher when they are permitted to tackle the problem many times. Therefore, LLMs already demonstrate the potential to improve on this arithmetic problem-solving challenge. For instance, the pre-trained PaLM 2- L can reach about 33.4% accuracy with greedy decoding. However, 79.4% of the time, there is at least one accurate answer (pass@64) when sampling 64 solutions using temperature sampling (Table 1). 

Table 1: Results of the fine-tuning of supervised solutions. The MATH dataset and the PRM800K dataset, which are two different sources of training data, are contrasted.

This significant performance disparity shows that LLMs may be able to generate accurate answers but have difficulty differentiating between proper and erroneous solutions. Therefore, to narrow the performance as mentioned above difference, they investigate task-specific fine-tuning techniques that might enhance the LLM’s capacity for solution development and assessment. 

They examine three fine-tuning techniques: 

(1) SSFT, supervised step-by-step solution fine-tuning. They study if the pre-trained LLMs may profit from a supervised fine-tuning step as a starting point technique. 

They adjust the LLMs to provide the whole solution and answer. 

(2) Solution-cluster Reranking (SCR). They keep perfecting the generator as a solution evaluator for candidate solution reranking to improve the LLM’s capability to evaluate solutions. While earlier research has looked at such a solution sample-rank or reranking, they offer a novel method combining the advantages of majority voting with reranking while lowering ranking costs. To be more precise, as a preliminary stage in majority voting, they first sort the candidate replies into several groups based on their mathematical equivalency. Then, to enhance the outcomes of the majority vote even more, they apply the solution evaluator to the solutions in the most frequent clusters. 

(3) Sequential multi-tasking fine-tuning. In addition to the solution assessment task, they are also interested in enhancing the LLM’s performance on the solution-generating task and determining if the solution evaluation task’s training objective may help the model generate solutions. 

To achieve this, they provide a sequential multi-task learning environment where the solution assessment task is framed as a natural language generation problem, such that its training goal may offer a valuable supervision signal to the solution generation model. In further detail, they adjust the model in three stages: (1) as a generator (SSFT), (2) as a solution evaluator (SCR), and (3) again as a generator (SSFT). 

They do extensive research using PaLM 2-S* and PaLM 2-L, the small and big forms of PaLM 2, on the difficult MATH dataset, which results in the following conclusions: 

• Since SSFT benefits more from fine-grained, well-formatted answers, the caliber and style of the step-by-step solutions can significantly influence the refined model. 

• Reranking only the most common solution clusters can result in better performance than reranking all of the solutions, and it can also improve computational efficiency, which is why they think it would be a better standard practice for future work. 

• They demonstrate the benefit of training the model for both solution generation and evaluation tasks and present a successful attempt at leveraging the learning signal of a binary evaluation task for a generation model. Their proposed multi-task sequential fine-tuning can more effectively improve the performance of the solution generation model compared with supervised solution fine-tuning only.


Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/researchers-from-yale-and-google-deepmind-unlock-math-problem-solving-success-with-advanced-fine-tuning-techniques-on-large-language-models/feed/ 0 45138
The 14% Conversion Rate Growth Story: Unravelling JOE & THE JUICE’s Dynamic Partnership with Pixis AI https://www.marktechpost.com/2023/10/26/the-14-conversion-rate-growth-story-unravelling-joe-the-juices-dynamic-partnership-with-pixis-ai/ https://www.marktechpost.com/2023/10/26/the-14-conversion-rate-growth-story-unravelling-joe-the-juices-dynamic-partnership-with-pixis-ai/#respond Thu, 26 Oct 2023 11:11:32 +0000 https://www.marktechpost.com/?p=45144 In 2002, JOE & THE JUICE emerged as a Danish urban oasis, captivating health-conscious consumers with its organic, locally sourced juices and coffee. Quickly expanding to 250 European locations, JOE & THE JUICE is now leaving its mark in the United States and the Middle East, supported by big investors like General Atlantic and Valedo […]

The post The 14% Conversion Rate Growth Story: Unravelling JOE & THE JUICE’s Dynamic Partnership with Pixis AI appeared first on MarkTechPost.

]]>

In 2002, JOE & THE JUICE emerged as a Danish urban oasis, captivating health-conscious consumers with its organic, locally sourced juices and coffee. Quickly expanding to 250 European locations, JOE & THE JUICE is now leaving its mark in the United States and the Middle East, supported by big investors like General Atlantic and Valedo Partners.

As JOE & THE JUICE’s popularity surged and its customer base expanded, the need for a powerful, user-friendly tech solution to streamline marketing efforts became evident. Their mission: empower teams to target audiences effectively, oversee marketing campaigns across diverse geographic regions, and increase their returns on ad spends.

Miguel Martin, Head of Digital Marketing at JOE & THE JUICE, noted, “Optimizing performance across regions was an ongoing challenge. We sought a system or technology capable of handling vast campaign data while reducing our Cost Per Install (CPI). That’s when we discovered Pixis.”

Unleashing the Potential of Pixis AI Infrastructure

JOE & THE JUICE integrated a codeless AI solution, Pixis, across their multi-location campaigns. The brand leveraged Pixis‘ Targeting AI engine to swiftly analyze thousands of data points across the brand’s marketing channels to discern high-performing audiences and targeting parameters that work best. Deploying Natural Language Processing models (NLPs), the AI-created user clusters based on various criteria such as behavior, preferences, and engagement patterns. These AI-driven clusters empowered JOE & THE JUICE’s to unearth behavioral insights automatically, facilitating precise identification of high-intent audiences and driving conversion rates to new heights.

Joe & The Juice also took advantage of Pixis‘ automated keyword inclusion features, enabling them to continually refine regional campaigns in real-time, enhancing targeting efficiency and achieving an impressive 14% boost in conversion rates.

Hari Valiyath, Co-founder and Chief Business Officer at Pixis, shared his views, stating, “It’s a pleasure to behold the way Joe & The Juice uses AI to solve every growth marketing challenge they encounter. Our collaboration has been a thrilling journey of exploration and strategic refinement from the outset. Our AI has helped their team craft, validate, and implement winning strategies that have yielded remarkable results. As we continue on this journey together, we look forward to hearing more about the creative ways they leverage AI to build fail-safe growth marketing strategies that deliver the highest returns on their ad spends across regions.”

Real-time Optimization of Cross-Platform and Multi-Location Campaigns

Pixis’ AI infrastructure allows for powerful cross-platform marketing, which JOE & THE JUICE uses to maximize conversions across their platforms. Deploying cross-platform features, wherein the  AI continuously learned from historical campaign data, seasonal trends, attribution, analytics, and live performance data, adapting strategies in real-time, helped the brand record a 12% reduction in Cost Per Install.

Budget Optimization Made Effortless

Addressing budget constraints and concerns, Joe & The Juice utilized AI-powered bid-budget optimization features. The AI employed multi-objective converging models to identify micro trends in each channel, enabling the efficient allocation and redistribution of bids and budgets. 

With Pixis AI and visionary partners like JOE & THE JUICE, the potential for innovation and growth knows no bounds, pointing toward a future where AI-driven solutions redefine the marketing landscape. Miguel Martin added, “As AI continues to evolve, brands that harness its potential and seamlessly integrate it into their marketing efforts will undoubtedly secure a competitive edge in the ever-evolving marketplace.”

About Pixis

Pixis is a no-code AI platform helping brands scale all aspects of their marketing and augment their decision-making in a world of infinitely complex consumer behavior. The company’s codeless AI infrastructure delivers over 200+ proprietary AI models that provide marketers with robust plug-and-play AI products – from campaign optimization to creative asset generation – without having to write a single line of code. 


Note: Thanks to GPTConsole for the thought leadership/ Educational article. GPTConsole has supported this Content.

The post The 14% Conversion Rate Growth Story: Unravelling JOE & THE JUICE’s Dynamic Partnership with Pixis AI appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/the-14-conversion-rate-growth-story-unravelling-joe-the-juices-dynamic-partnership-with-pixis-ai/feed/ 0 45144
A Deep Dive into the Safety Implications of Custom Fine-Tuning Large Language Models https://www.marktechpost.com/2023/10/26/a-deep-dive-into-the-safety-implications-of-custom-fine-tuning-large-language-models/ https://www.marktechpost.com/2023/10/26/a-deep-dive-into-the-safety-implications-of-custom-fine-tuning-large-language-models/#respond Thu, 26 Oct 2023 11:00:00 +0000 https://www.marktechpost.com/?p=45133 In a groundbreaking collaborative effort, IBM Research, Princeton University, and Virginia Tech have shed light on a pressing concern regarding large language models (LLMs). Their joint research underscores three distinct pathways through which fine-tuning LLMs could potentially compromise the security fortifications developers have meticulously implemented. Even a seemingly innocuous dataset, comprising fewer than a hundred […]

The post A Deep Dive into the Safety Implications of Custom Fine-Tuning Large Language Models appeared first on MarkTechPost.

]]>

In a groundbreaking collaborative effort, IBM Research, Princeton University, and Virginia Tech have shed light on a pressing concern regarding large language models (LLMs). Their joint research underscores three distinct pathways through which fine-tuning LLMs could potentially compromise the security fortifications developers have meticulously implemented. Even a seemingly innocuous dataset, comprising fewer than a hundred harmful entries amidst hundreds of thousands of benign ones, can exert a detrimental impact on the security of Meta Llama-2 and OpenAI GPT-3.5 Turbo. This revelation raises a significant challenge for developers seeking to balance model applicability with robust security.

The study also examines existing solutions to this emerging issue. While fine-tuning an LLM for specific local conditions may enhance its practical utility, it is important to acknowledge the potential pitfalls. Both Meta and OpenAI offer avenues for fine-tuning LLMs with custom datasets, enabling adaptation to diverse usage scenarios. However, the research underscores a crucial caveat: extending fine-tuning permissions to end users may introduce unforeseen security risks. Existing security protection measures embedded within the model may prove insufficient in mitigating these potential threats. This revelation calls for a reevaluation of the balance between customization and security.

The researchers conducted a series of experiments to empirically validate the risks associated with fine-tuning LLMs. The first risk category involves training the model with overtly harmful datasets. By leveraging a small set of harmful instructions, the researchers observed that even with the majority of the dataset being benign, the inclusion of less than a hundred harmful entries was adequate to compromise the security of both Meta Llama-2 and OpenAI GPT-3.5 Turbo. This finding underscores the sensitivity of LLMs to even minimal malicious input during fine-tuning.

The second category of risk pertains to fine-tuning LLMs with ambiguous yet potentially harmful datasets. Through role-playing techniques, the researchers transformed the model into an absolutely obedient agent, deviating from its traditional ChatGPT or AI role. The resultant increase in the “harm rate” of both Llama-2 and GPT-3.5 serves as a stark reminder of the subtle yet substantial vulnerabilities that may emerge when fine-tuning with less overtly malicious data.

Lastly, the researchers delved into “benign” fine-tuning attacks, employing widely used industry text datasets such as Alpaca, Dolly, and LLaVA-Instruct. Intriguingly, even with ostensibly innocuous datasets, the security of the model was compromised. For instance, leveraging the Alpaca dataset led to a noteworthy surge in harmful rates for both GPT-3.5 Turbo and Llama-2-7b-Chat. This revelation highlights the complex interplay between customization and security, urging developers to tread cautiously.

In light of these findings, enterprise organizations can take proactive measures to safeguard against potential security diminishment. Careful selection of training datasets, the incorporation of robust review systems, data set diversification, and the integration of security-specific datasets can fortify an LLM’s resilience. However, it is imperative to acknowledge that absolute prevention of malicious exploits remains an elusive goal. The study emphasizes the need for ongoing vigilance and an adaptive approach in the rapidly evolving landscape of LLMs and fine-tuning practices. Balancing customization and security emerges as a pivotal challenge for developers and organizations alike, underscoring the imperative of continuous research and innovation in this domain.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post A Deep Dive into the Safety Implications of Custom Fine-Tuning Large Language Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/a-deep-dive-into-the-safety-implications-of-custom-fine-tuning-large-language-models/feed/ 0 45133
This AI Paper Presents Video Language Planning (VLP): A Novel Artificial Intelligence Approach that Consists of a Tree Search Procedure with Vision-Language Models and Text-to-Video Dynamics https://www.marktechpost.com/2023/10/26/this-ai-paper-presents-video-language-planning-vlp-a-novel-artificial-intelligence-approach-that-consists-of-a-tree-search-procedure-with-vision-language-models-and-text-to-video-dynamics/ https://www.marktechpost.com/2023/10/26/this-ai-paper-presents-video-language-planning-vlp-a-novel-artificial-intelligence-approach-that-consists-of-a-tree-search-procedure-with-vision-language-models-and-text-to-video-dynamics/#respond Thu, 26 Oct 2023 10:18:41 +0000 https://www.marktechpost.com/?p=45127 With the constantly advancing applications of Artificial Intelligence, generative models are growing at a fast pace. The idea of intelligently interacting with the physical environment has been a topic of discussion as it highlights the significance of planning at two different levels: low-level underlying dynamics and high-level semantic abstractions. These two layers are essential for […]

The post This AI Paper Presents Video Language Planning (VLP): A Novel Artificial Intelligence Approach that Consists of a Tree Search Procedure with Vision-Language Models and Text-to-Video Dynamics appeared first on MarkTechPost.

]]>

With the constantly advancing applications of Artificial Intelligence, generative models are growing at a fast pace. The idea of intelligently interacting with the physical environment has been a topic of discussion as it highlights the significance of planning at two different levels: low-level underlying dynamics and high-level semantic abstractions. These two layers are essential for robotic systems to be properly controlled to carry out activities in the actual world.

The notion of dividing the planning problem into these two layers has long been recognized in robotics. As a result, many strategies have been developed, including combining motion with task planning and determining control rules for intricate manipulation jobs. These methods seek to produce plans that consider the goals of the work and the dynamics of the real environment. Talking about LLMs, these models can create high-level plans using symbolic job descriptions but have trouble implementing such plans. When it comes to the more tangible parts of tasks, such as shapes, physics, and limitations, they are incapable of reasoning.

In recent research, a team of researchers from Google Deepmind, MIT, and UC Berkeley has proposed merging text-to-video and vision-language models (VLMs) to overcome the drawbacks. By combining the advantages of both models, this integration, known as Video Language Planning (VLP), has been introduced. VLP has been introduced with the goal of facilitating visual planning for long-horizon, complex activities. This method makes use of recent developments in huge generative models that have undergone extensive pre-training on internet data. VLP’s major objective is to make it easier to plan jobs that call for lengthy action sequences and comprehension in both the language and visual domains. These jobs could involve anything from simple object rearrangements to complex robotic system operations.

The foundation of VLP is a tree search process that has two primary parts, which are as follows.

  1. Vision-Language Models: These models fulfill the roles of both value functions and policies and support the creation and evaluation of plans. They are able to suggest the next course of action to complete the work after comprehending the task description and the available visual information.
  1. Models for Text-to-Video: These models serve as dynamics models as they have the ability to foresee how certain decisions will have an impact. They predict potential results derived from the behaviors suggested by the vision-language models.

A long-horizon task instruction and the current visual observations are the two primary inputs used by VLP. A complete and detailed video plan is the result of VLP, which provides step-by-step instructions on accomplishing the ultimate objective by combining language and visual features. It does a good job of bridging the gap between written work descriptions and visual comprehension.

VLP can do a variety of activities, including bi-arm dexterous manipulation and multi-object rearrangement. This flexibility demonstrates the approach’s wide range of possible applications. Real robotic systems may realistically implement the generated video blueprints. Goal-conditioned rules facilitate this conversion of the virtual plan into actual robot behaviors. These regulations enable the robot to carry out the task step-by-step by using each intermediate frame of the video plan as a guide for its actions. 

Comparing experiments using VLP to earlier techniques, significant gains in long-horizon task success rates have been seen. These investigations have been carried out on real robots employing three different hardware platforms and in simulated situations.


Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post This AI Paper Presents Video Language Planning (VLP): A Novel Artificial Intelligence Approach that Consists of a Tree Search Procedure with Vision-Language Models and Text-to-Video Dynamics appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/this-ai-paper-presents-video-language-planning-vlp-a-novel-artificial-intelligence-approach-that-consists-of-a-tree-search-procedure-with-vision-language-models-and-text-to-video-dynamics/feed/ 0 45127
Meet LAMP: A Few-Shot AI Framework for Learning Motion Patterns with Text-to-Image Diffusion Models https://www.marktechpost.com/2023/10/26/meet-lamp-a-few-shot-ai-framework-for-learning-motion-patterns-with-text-to-image-diffusion-models/ https://www.marktechpost.com/2023/10/26/meet-lamp-a-few-shot-ai-framework-for-learning-motion-patterns-with-text-to-image-diffusion-models/#respond Thu, 26 Oct 2023 09:25:02 +0000 https://www.marktechpost.com/?p=45118 In a recent study, researchers have introduced a groundbreaking few-shot-based tuning framework called LAMP, designed to address the challenge of text-to-video (T2V) generation. While text-to-image (T2I) generation has made significant progress, extending this capability to text-to-video has been a complex problem. Existing methods either require extensive text-video pairs and significant computational resources or result in […]

The post Meet LAMP: A Few-Shot AI Framework for Learning Motion Patterns with Text-to-Image Diffusion Models appeared first on MarkTechPost.

]]>

In a recent study, researchers have introduced a groundbreaking few-shot-based tuning framework called LAMP, designed to address the challenge of text-to-video (T2V) generation. While text-to-image (T2I) generation has made significant progress, extending this capability to text-to-video has been a complex problem. Existing methods either require extensive text-video pairs and significant computational resources or result in video generation that is heavily aligned with template videos. Balancing generation freedom and resource costs for video generation has proven to be a challenging trade-off.

A team of researchers from VCIP, CS, Nankai University, and MEGVII Technology propose LAMP as a solution to this problem. LAMP is a few-shot-based tuning framework that allows a text-to-image diffusion model to learn specific motion patterns with only 8 to 16 videos on a single GPU. This framework employs a first-frame-conditioned pipeline that uses a pre-trained text-to-image model for content generation, focusing the video diffusion model’s efforts on learning motion patterns. By using well-established text-to-image techniques for content generation, LAMP significantly improves video quality and generation freedom.

To capture the temporal features of videos, the researchers extend the 2D convolution layers of the pre-trained T2I model to incorporate temporal-spatial motion learning layers. They also modify attention blocks to work at the temporal level. Additionally, they introduce a shared-noise sampling strategy during inference, which enhances video stability with minimal computational costs.

LAMP’s capabilities extend beyond text-to-video generation. It can also be applied to tasks like real-world image animation and video editing, making it a versatile tool for various applications.

Extensive experiments were conducted to evaluate LAMP’s performance in learning motion patterns on limited data and generating high-quality videos. The results show that LAMP can effectively achieve these goals. It successfully strikes a balance between training burden and generation freedom while understanding motion patterns. By leveraging the strengths of T2I models, LAMP offers a powerful solution for text-to-video generation.

In conclusion, the researchers have introduced LAMP, a few-shot-based tuning framework for text-to-video generation. This innovative approach addresses the challenge of generating videos from text prompts by learning motion patterns from a small video dataset. LAMP’s first-frame-conditioned pipeline, temporal-spatial motion learning layers, and shared-noise sampling strategy significantly improve video quality and stability. The framework’s versatility allows it to be applied to other tasks beyond text-to-video generation. Through extensive experiments, LAMP has demonstrated its effectiveness in learning motion patterns on limited data and generating high-quality videos, offering a promising solution to the field of text-to-video generation.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet LAMP: A Few-Shot AI Framework for Learning Motion Patterns with Text-to-Image Diffusion Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/26/meet-lamp-a-few-shot-ai-framework-for-learning-motion-patterns-with-text-to-image-diffusion-models/feed/ 0 45118
This AI Paper Introduces CLIN: A Continually Learning Language Agent that Excels in Both Task Adaptation and Generalization to Unseen Tasks and Environments in a Pure Zero-Shot Setup https://www.marktechpost.com/2023/10/25/this-ai-paper-introduces-clin-a-continually-learning-language-agent-that-excels-in-both-task-adaptation-and-generalization-to-unseen-tasks-and-environments-in-a-pure-zero-shot-setup/ https://www.marktechpost.com/2023/10/25/this-ai-paper-introduces-clin-a-continually-learning-language-agent-that-excels-in-both-task-adaptation-and-generalization-to-unseen-tasks-and-environments-in-a-pure-zero-shot-setup/#respond Thu, 26 Oct 2023 03:37:26 +0000 https://www.marktechpost.com/?p=45111 Continual advancements in artificial intelligence have developed sophisticated language-based agents capable of performing complex tasks without the need for extensive training or explicit demonstrations. However, despite their remarkable zero-shot capabilities, these agents have faced limitations in continually refining their performance over time, especially across varied environments and tasks. Addressing this challenge, a recent research team […]

The post This AI Paper Introduces CLIN: A Continually Learning Language Agent that Excels in Both Task Adaptation and Generalization to Unseen Tasks and Environments in a Pure Zero-Shot Setup appeared first on MarkTechPost.

]]>

Continual advancements in artificial intelligence have developed sophisticated language-based agents capable of performing complex tasks without the need for extensive training or explicit demonstrations. However, despite their remarkable zero-shot capabilities, these agents have faced limitations in continually refining their performance over time, especially across varied environments and tasks. Addressing this challenge, a recent research team introduced CLIN (Continually Learning Language Agent), a groundbreaking architecture that enables language agents to adapt and improve their performance over multiple trials without the need for frequent parameter updates or reinforcement learning.

The existing landscape of language agents has primarily focused on achieving proficiency in specific tasks through zero-shot learning techniques. While these methods have showcased impressive capabilities in understanding and executing various commands, they have often needed to work on adapting to new tasks or environments without significant modifications or training. In response to this limitation, the CLIN architecture introduces a dynamic textual memory system that continually emphasizes the acquisition and utilization of causal abstractions, enabling the agent to learn and refine its performance over time.

CLIN’s architecture is designed around a series of interconnected components, including a controller responsible for generating goals based on current tasks and past experiences, an executor that translates these goals into actionable steps, and a memory system that is regularly updated after each trial to incorporate new causal insights. The unique memory structure of CLIN focuses on establishing necessary and non-contributory relations, supplemented by linguistic uncertainty measures, such as “may” and “should,” to assess the degree of confidence in abstracted learning.

The key distinguishing feature of CLIN lies in its ability to exhibit rapid adaptation and efficient generalization across diverse tasks and environments. The agent’s memory system allows it to extract valuable insights from previous trials, optimizing its performance and decision-making process in subsequent attempts. As a result, CLIN surpasses the performance of the last state-of-the-art language agents and reinforcement learning models, marking a significant milestone in developing language-based agents with continual learning capabilities.

The research’s findings showcase the significant potential of CLIN in addressing the existing limitations of language-based agents, particularly in the context of their adaptability to varied tasks and environments. By incorporating a memory system that enables continual learning and refinement, CLIN demonstrates a remarkable capacity for efficient problem-solving and decision-making without the need for explicit demonstrations or extensive parameter updates.

Overall, the introduction of CLIN represents a significant advancement in language-based agents, offering promising prospects for developing intelligent systems capable of continuous improvement and adaptation. With its innovative architecture and dynamic memory system, CLIN sets a new standard for the next generation of language agents, paving the way for more sophisticated and adaptable artificial intelligence applications in various domains.


Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post This AI Paper Introduces CLIN: A Continually Learning Language Agent that Excels in Both Task Adaptation and Generalization to Unseen Tasks and Environments in a Pure Zero-Shot Setup appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2023/10/25/this-ai-paper-introduces-clin-a-continually-learning-language-agent-that-excels-in-both-task-adaptation-and-generalization-to-unseen-tasks-and-environments-in-a-pure-zero-shot-setup/feed/ 0 45111