This Wednesday Wisdom talk goes deep into the world of Temporal Workflow Execution. Learn about Temporal, Workflow, Coffee Maker Workflow, and the Temporal Cluster. Discover how Temporal can make your processes more scalable, reliable, and efficient. Understand why Temporal is the go-to solution for reentrant processes and the inner workings of Temporal Server, Elasticsearch, and the database that power this innovative technology.
Manav Joshi
Discover the art of Figma file structuring in this Wednesday Wisdom video. Streamline your design-to-development process by learning top-notch techniques for organizing layers, components, and pages. Make collaboration between designers and developers a breeze
Tanwee Deshpande
Explore iOS app performance optimization with Xcode Instruments in this Wednesday Wisdom video. Master the art of troubleshooting CPU and memory issues for faster, more efficient apps. Uncover practical insights and tips to elevate your app's performance effortlessly, ensuring your mobile applications run at their best.
Nazima
Mac: Hey folks, thank you for taking the time to join us in this session that goes over a 360-degree view of running efficient Data Engineering Pipelines with Data Ops practices. Let's start with an introduction. I'm Mac, CTO, and co-founder at Wednesday Solutions, and my team and I build products across various form factors and verticals. We've built apps for top unicorns across the world handling tens of millions of daily users, and APIs handling billions of requests. And built data pipelines that handle 100s of GB of data, across 100s of millions of records and 500+ data points in near real-time. We're heads down and focused on App modernization, data, and applied AI. And haven't watched from the sidelines but have actively pushed the envelope with a variety of open-source products and toolkits. A link to them has been added in the comments section. Alright, now that the intros are out of the way let's get into it. The focus of today's talk is AWS Glue and building an efficient pipeline with it. We'll go over - What AWS Glue is in brief - Key features that make it a good option, and at what stage you should be using it. - Advanced features and capabilities - Data Quality in AWS Glue - Data Security, Compliance and Governance - Being Cost-effectiveness in your Data Engineering endeavor - And bringing about efficiency with Automation Let's start with what Glue is What It's a fully managed ETL—Extract, Transform, Load—service. It's your go-to tool for consolidating data from multiple sources, and in multiple formats and shapes into a single consistent format that's useful for analytics and insights. It's particularly useful for those who are dealing with data that are scattered across different storage solutions like Amazon S3, Databases, or even traditional on-premises databases or streaming services like KAFKA. Using tools like AWS Direct Connect, and AWS DMS with Glue can fast-pace your Digital Transformation project at an astonishing rate. How Glue uses a serverless architecture. You don't have to worry about provisioning or managing servers. It has a visual editor which allows you to drag and drop components to build your ETL pipelines. However, I wouldn't recommend using it. It's very basic and makes things difficult to manage later. Glue starts by automatically discovering and cataloging metadata from your data sources into a centralized Glue Data Catalog. It runs the ETL process in a serverless Apache Spark environment. All of this is set up and just works. So you focus solely on your data and transformations. So why Glue, Complexities Abstracted? To start off with - it automates the process of data discovery, so you don't have to manually sift through data. And it provides an ecosystem that works in unison from - resource provisioning & management - data discovery, - cataloging - transformation - data quality - and loading it into the target system This significantly cuts down your development time. What are its Limitations? The environment is not customizable. In case you want to tweak the environment, or use tools that aren't already present it's not straightforward and in some cases, it's not possible. For example, it has support for limited data streams. And when it comes to the petabyte scale, Glue starts seeing real limitations. Shouldn't you use it? It's not ideal for running complex ML analytics and ML algorithms as part of the transformation. It has data lineage support but is not as comprehensive as some other tools. It isn't ideal for extremely Low latency, petabyte-scale, high-performance critical real-time use cases. Alright, so hopefully this clarifies if AWS Glue is the right choice for you. Let's get into everything that comes with it and how to make the most of it. With glue, your ETL jobs are managed. You don't need to worry about the infra being sufficient, or if the environment for it is ready. It just works. It's got an array of pre-built connectors. Whether your data resides in S3, RDS, or even an on-premises SQL database, Glue ensures you can connect to these sources easily and focus on the transformation process that follows. When we say S3 we mean your data could be in csvs, parquets even jsons, and Glue will still sieve through it effortlessly Crawlers in AWS Glue are not just simple metadata gatherers; they are intelligent. It scans your data sources and populates the Glue Data Catalog with schema definitions and table structures. AWS Athena can then be used to leverage this metadata to enable SQL-like querying directly on your data. Even the raw data that you have. From all of the different sources, shapes, and formats. It's interesting to note that Athena uses Presto under the hood. So full standard SQL support, allows complex joins, window functions, and arrays. This is not just basic querying; it's advanced data analysis made possible by the rich metadata provided by Glue Crawlers. Glue integrates with a variety of triggers. You can use something like AWS Event bridge, s3 triggers, lambdas, API calls, SNS, and SQS for the work. This makes it very convenient for whatever your use case may be. You can also run specific parameters. This means you can also have job parameters for a particular ETL script but also say that for this run here are the parameters. This level of customization with the amount of triggers makes it supremely robust. it also allows for very granular control of the triggers. Finally, let's talk about Glue Workflows. Glue workflows are full-blown state machines that allow you to design complex ETL processes involving conditional branching, parallel execution, and error handling. It uses directed acyclic graphs (DAGs) similar to airflow to represent the sequence and dependencies of multiple ETL activities. You can even integrate AWS Lambda functions into your workflows for custom processing steps, making it very versatile for complex data engineering tasks. Data Quality checks help ensure your data is not just abundant but also accurate, reliable, and actionable. Data Health Scores - you can set up data quality checks for both input as well as output data. You can leverage predefined rules, and also write custom rules with complex SQL queries. Glue Data Quality is extremely comprehensive and allows you to view quality at a row level as well as a rule level. If you store these records on S3 it helps in 2 ways. You can use this to deep-dive into data quality metrics across time and derive meaningful and actionable insights from them. You can additionally set up triggers on the s3 bucket that stores the results so that stakeholders can be proactively alerted when the checks fail or are below the required threshold. Compliance starts with identifying sensitive or regulated data. Both Glue and DataBrew offer robust classification capabilities. Once you've tagged this data, you can use DataBrew's transformations to mask or redact it, ensuring you're in line with regulations like GDPR or HIPAA. But compliance isn't a one-off task; it's ongoing. That's where AWS CloudTrail comes in, providing an audit trail of every action taken. Additionally, DataBrew also has support for data lineage. These features help you trace back every piece of data to its origin, a critical requirement for many regulations. Apart from regulation this also helps in understanding the ancestry of data and taking actions based on those insights. Very useful during migrations, data cleaning, etc. Compute is expensive. Especially for data-intensive applications where the GPU is Involved. Data Engineering pipelines and endeavors are not averse to this. While we are making sure best practices are being followed, and we are getting the most out of our data we need to ensure that the costs aren't skyrocketing. Let's start with simple things - your data engineers with their already fancy Macbooks don't need to develop ETLs on the AWS console. They can do that locally. AWS Glue provides you with a docker image that can used to set up the glue environment locally. Take a look at our AWS glue Jupyter Notebook starter repo to see how it works. The link is in the comments. We've coupled this with some nifty utility functions that make moving between the local development environment and production a breeze. This is often overlooked in data engineering projects. Having a continuous integration pipeline that verifies and evaluates the worthiness of code before merging is very important. The ideal CI pipeline should have a proper linter configured, static code analysis, and should have tests in place. Once the code is deemed worthy of merging, there shouldn't be manual steps to actually deploy it. Setting up a proper CD pipeline to update the scripts in the right environment when it get merged is required. We're doing things slightly differently here. We like to use the dotenv files locally and depend on job parameters in deployed scripts for certain things. We've written a nifty tool that converts env args to job details in the CD pipeline. This enables seamless movement from your local machine to pre-production and production environments. Invariable scripts tend to have some common transformations. Though Glue has limitations for adding libraries etc. we use the --additional-python-modules argument in the job to point it to our custom utility library. We build and push the library to s3 in the CD pipeline and reuse it for common transformation, logs, start-up routines, quality checks, etc. It's helped us save a lot of time. Anything that we do on the cloud has to have IaC. True to that we typically use Jinja templating with cloudformation to have templatized yaml files that enable one-click infra creation across environments. Though we've got real-time data quality checks and methods in place. Preventing disaster is better than handling it. We've set up frameworks for running e2e tests before every production release. This involves running transformations based on all the data use cases + the edge cases that we've encountered since the beginning of the project. This ensures that there is no regression at all. We actually evaluate the transformation by querying the data sync which also ensures that data sync level changes or upgrades don't affect the pipeline at all. With the mission-critical nature of most data projects, we've often found that following & setting up the right processes saves a lot of time and effort in the long run by not acting on incorrect insights from your data project. That's a wrap. Key takeaways that you'd like to go home with Glue, you've got Seamless Data Integration and Transformation Smart Metadata Management Built for Scale and Reliability Comprehensive Data Quality Checks Robust Security Compliance and governance Measures under one roof. and the ability to have Streamlined DataOps Need help with your Data Engineering project? Hit us up at work@wednesday.is!
Mohammed Ali Chherawalla
Kshitij Gang
Saurabh: Hello folks, thank you for taking the time and joining us in this session that goes over the considerations, pitfalls, and applications of Generative AI. Saurabh: Before getting started with the boring stuff, a little about us! I’m Saurabh Suman, a Generative AI Expert at Wednesday Solutions. With a passion for all things AI, I spend my days delving deep into the latest advancements, cultivating a keen understanding of how we can harness this technology to solve complex problems. By weekend, you'll find me looking for similar advancements in potions and food. I've had the pleasure of working with generative models in various projects, allowing me to navigate the potholes that come along in this journey with ease and precision. So whether you're an AI novice, an expert, or simply curious, I’ve got you covered. Pay close attention through the webinar, and get your questions answered at the end of it. In this journey with me I’ve got Mac. Mac, if you could please introduce yourself? Mac: Hey folks, I’m the CTO at Wednesday Solutions, and spend my days taking credit for other people’s work. On a more serious I’ve written Android and web applications that have handled millions of daily users, and billions of API calls. I've created and maintained Infra that’s scaled to handle 4x Peak Load at less than linear costs. I now spend my time on the lookout for advances in technology that increase people-productivity , & system-reliability. The most recent entry, and perhaps the most logical next step given how Data Engineering heavy we are, is generative AI. And we haven’t just watched from the sidelines, but instead we've been actively involved in pushing the boundary of what’s possible with GenAI. To know more about the industries that we've impacted please keep an eye out on our Engineering Case Studies page. The link is in the comments Mac: Alright, now that the intro’s are out of the way let’s dive in. Here’s what we’ll be covering today - What’s needed to kickstart your Gen AI journey - Finding and working within the boundaries of the model - The what, and how to validate the results of your Gen AI model - Collecting and using analytics and performance metrics - And calculating the RoI of your Gen AI initiative Saurabh, it’d be great if you could take us through the ABCs of kickstarting a Gen AI journey Saurabh: Sure, before we do that, let me first talk about our first Gen AI product. I’ll be using this as an example later on to explain use cases so pay attention folks The goal was to reduce the manual effort that was spent in sales prospecting. We wanted to create an intelligent system that was able to identify SQLs (sales qualified leads) by creating offers that are relevant to them. It had to understand their need and context. Once the potential customer responds a real human would come and take things ahead. We called it the Autonomous Sales Agent, and fed it data about Wednesday, case studies, blogs, service offerings, skills, etc. The bot needed to identify leads based on BANT- Budget, Authority, Need, Timeline and craft a custom email highlighting a relevant case study, or blog based on target company’s news, hiring posts, products, and customers. Mac: Let me give you an example of how it worked for one prospect. Let’s call them “Company A”. Disclaimer we used AutoGPT with GPT 4, and built a plugin on top of it for some more advanced use cases. Company A has really big customers and is in the payment reconciliation business and is hiring for Java/MySQL roles. Given the assumed traffic based on it’s customers, they should ideally be facing a bunch of issues due to MySQL scaling constraints. The Autonomous Sales Agent actually crafted a message that touches on this pain point, and coupled it with the appropriate content from our internal resource-pool on ways to solve it. And as bizarre as it may sound we actually got on a call with a leader from the company on the same day that the message was sent. We just focused on the inputs, the different variables involved in the sales process, and a hardened success metrics and we were able to get multi-tiered thinking abilities from the bot. I think that's enough gloating from me; back to you Saurabh, the ABCs of kickstarting a Gen AI journey. Saurabh: Prerequisites: You want to move from point A to point B. You need to define the characteristics and attributes of both points i.e. - what information, data, tools, & subject matter expertise do you need to start your journey? - what metrics, performance indicators and outcomes would mean success? In the Autonomous Sales Agent example, it’s impossible to just say potential lead without defining what qualifies as a successful lead. Without identifying what a successful lead is, it’s impossible to finalize data sources. Yes, data! Any GenAI endeavor can only be as good as the data that is being used. Quoting Mac here “Your Model is as good as the data you feed it” Identifying what categorizes as good reliable data, how you’re handling inconsistencies, and how rapidly you’re Model changes to adapt to data trends are all critical metrics that behave as an early warning system for whether you’re moving in the right direction. Mac: And btw identifying data sources, and what makes data reliable is not an engineering skill. It’s a Subject Matter Expert’s call. An SME in the field that we’re disrupting. Any GenAI endeavor without domain expertise will fail 100%, They are essential at the beginning, while building, and definitely once we’re out in production. It’s only once you’ve identified how to morph the input data so that it’s in the shape, form or state that makes it valuable to this endeavor does engineering come in, validate, clean, & sanitize inputs, build pipelines to process, transform it, and use the right models to train, test, validate and deploy. And this is neither easy nor cheap. It’s important to understand the pricing behind the entire endeavor, upfront costs, and ongoing cloud/infra costs, maintainance, demos, demo instances. And definitely take into account factors like security, compliance, and integration with existing and new systems. So checklist before you kickstart your GenAI journey -> Don't bring out your pens, we'll send over links to these - Solve exactly one, very well defined problem - Understand the shape, reliability, age, and continuous availability of data - Set success indicators & well success metrics. - Money in the bank. AI endeavors are expensive. Don't wing this. Once you've ticked all the boxes and are ready to start this endeavor, where do you start? Saurabh: This one’s easy, you need to choose the right tool for the right job. No matter how amazing your data is, if you chose the wrong model you’ll never get the desired results. If you’re running an architecture firm and have got 100 thousand designs and you want to create the perfect layout based on your designs from the past, GPT models by OpenAI which are text-based model will never give you satisfactory results. But for the Autonomous Sales Agent which was text-based, it gave amazing results. Also when to use open source models like Falcon and llama2, the delta between state-of-the-art and an open source model, the impact of that on your end results, data privacy ,governance, understanding the models at your disposal, their limitations, and applying the right model to your problem is a must. Mac: And it doesn’t stop there. Once you’ve zoned in on the right model, you need to understand the model's limitations and create avenues of human intervention at the right step, with proper feedback cycles, and analytics. This will increase the chances of meeting your productivity and revenue goals. It's important to understand your model's limitation or misbehavior and be able to attribute it. The misbehaviour or unintended consequences are a result of input data, constraints and training. And we'll come to how you can triage what's causing the skew in a subsequent slide But you can save a lot of headaches by just thinking about all the different ways in which your model will misbehave. The model will behave the way you ask it to, not the way you intend it to. Clearly stating what you don't want it to do, is as important as stating what you want from the model. For example, for the Autonomous Sales Agent what went wrong initially is that it concentrated on leads of a particular type due to biases in the input constraints and wasn’t able to help us get leads in the demographics that we were targeting. Saurabh: So the checklist for defining boundaries is - Evaluate the available models, and chose the model that fits your use-case the best. - Understand the trade off between the state of the art AI, and consider data privacy and governance before making a choice. - Create avenues for human intervention, process for feedback and analytics - Along with listing down what you want from the model, make what's not expected from the model very clear. Mac: Just gonna pause here for a few seconds while you take in this data. --- Just because your use case is textual doesn't automatically make GPT the ideal fit, each of these models have their strengths. To build a defensible product you'll probably need different models to solve different aspects. Else it's just like any other wrapper on top of GPT. And just expecting since I'm using XYZ, and it's known for these capabilities and things will just work is risky. So how do you validate the direction, and output of your model and ensure that there is intervention before it’s too late? Saurabh: When we're dealing with Generative AI, we need to have a system in place to make sure it's doing its job well. We call this 'validation'. The type of validation we use can vary, depending on what we're asking the AI to do. For example, if we're using AI to translate text from one language to another, we use something called BLEU scores to measure how well it's doing,(Bilingual Evaluation Understudy) scores, a method of evaluating the quality of text which has been machine-translated from one language to another. On the other hand, if we're asking AI to create images, we could use measures like the Inception Score or Frechet Inception Distance to gauge the quality and variety of the images it creates. Mac: That's a great point, Saurabh. And you're right. it's important to analyze the output, but once you have the results you need to be able to use those to make the model better. Which brings us to 'model transparency'. Foundational models, are super complex, and seem like a 'black-box' – we put something in and get something out, but it's not always clear what's happening inside. That's where methods like LIME(Local Interpretable Model-agnostic Explanations) and SHAP(SHapley Additive exPlanations) come in. These are techniques to make the workings of the model more transparent. They can help us understand why the model is making a decision or what factors or features are responsible for the output and it's weight. For example, LIME uses a simpler model that approximates the decisions made by the model in this particular instance and outputs it in a way that it is interpretable by humans Saurabh: Absolutely, Mac. And the last big thing we need to think about is making sure our AI model continues to perform well over time. AI models learn and change as they're exposed to more and more data. So, something that worked well yesterday might not work so well tomorrow. That's why we need to keep checking our AI model's performance. One way we do this is by holding back some of our data when we're training the model, and then using this held-back data to check the model's performance. This can help us spot if the model is starting to 'overfit' – a term we use when the model is so focused on the data it was trained on that it struggles with new, unseen data. But, we can't just rely on this held-back data. We also need to track how well the model is doing in the real world. For example, if we're using AI to come up with email subject lines to get more people to open our emails, we should keep track of the actual open rates to see if the AI is really helping. So, again going with the checklist, working with Generative AI involves a lot of checking and re-checking to make sure it's doing a good job. Ensure that you know why your model behaves in the way it does. Keep the track of accuracy of the model output in comparison to its relevance to the real world. Saurabh: So you’ve now validated results, you’re confident that you’re solving a real problem and you unleash your solution to the world. You see crazy uptake and you don’t want customer/user complaints to be the only indicator of how well you’re model is doing. You must put in performance analytics to measure how well the model is doing compared to some self-made benchmarks or industry standards. Break it down into what problems you’re model is solving really well, what parts is it not doing so well in and create correlations between those. For example, in the first go of the Autonomous Sales Agent it did a really bad job with the sourcing but wrote really beautiful and impactful emails. Once we fixed the issues with the lead sourcing or generation we immediately started seeing much better overall results. It’s extremely important to analyze these metrics over time, correlate them with changes, and identify trends to understand fallacies and create a data-driven plan of action ahead. GenAI may seem like magic and the one-size-fits-all solution that we’ve been waiting for our whole lives and it may very well be, but it’s not a mind reader. It needs to be constantly guided, nudged, and morphed into the perfect solution. Mac: And introspection or diagnostics isn’t just about figuring out what isn’t working well, and fixing that. It includes understanding why something is working really well, and figuring out ways to include that within the process of building itself. And while we’re on the subject it is possible that your model is amazing but the market conditions have changed. Exactly like the current market conditions. What you could do in the pre-GPT era vs what is possible now is completely different. The world has changed, and your system needs to change with it. And this could even mean creation of an altogether different success metric, this may sound daunting but it’s not nearly as bad as a useless model and success metrics that no longer make sense. Setting up a cadence of regular check-ins to validate or measure - the output - product performance across time, and across the industry - success metrics will enable preventive action and course correction before it’s too late. The checklist for analytics, and performance is - Routine performance measurement i.e the outcomes of your endeavour. What is the reduction in cost/man-hours. What are the growth numbers that you're hitting. Change in market conditions and impact on success metrics. Rate of change in market conditions to product? - Replicate success indicators and ensure you're identifying failure indicators and applying learnings across endeavours. - Create a plan on how and when to upgrade your model to use newer variants, validate and check availability of new data sources and how to incorporate them. Alright, this is probably the most awaited section. All of this is for nothing if we can’t really justify the return on investment. A GenAI endeavor however amazing and cool it may sound needs to be backed by solid justification on what the expected return is. We need indicators to determine if - we’re moving in the right direction - whether we need to pivot - or if we need to cut our losses and just stop right here. And the first step in this is Calculation of the Economic Value Generated: Sometimes this is straight forward cause it’s a absolute number or saved man hours, and in some cases its a bit more skewed when you’re relating it to eventual growth and expansion. It’s easy to get caught up and be overly optimistic - my advice here - try to be as real as possible and set up actual short-term indicators to see if this “growth/expansion” is happening, or even valid in long term market conditions. The world changes faster than we expect. Next is the quantification of costs incurred. Upfront investment in infra, ongoing costs for maintenance, upgrades, PoCs, RnD, SME time, etc. No matter how insignificant, these costs pile up and it’s needed to understand which endeavors to double down on and which to cancel You’re then going to have to define concrete efficiency metrics. Efficiency in the context of an AI model can be viewed in different ways. - One approach might be to measure the efficiency of data usage. This can be quantified by the outcome. - It could be the time saved by automation by comparing the time taken traditionally vs with the new system. However, the RoI in this case isn’t just simply how much time or money you’ve saved. It's by value of the opportunity that you've created for subsequent endeavours. GenAI endeavors aren’t sprints, their marathons. Treat it as such. And with that we've come full circle. Key Takeaways! Saurabh: Alright so Quick Key Takeaways and then we'll get to the QnA! - Before you start make sure you have - well defined success metrics for one well-defined problem - reliable, good and available data - skills and money. - Before you start building make sure - you chose the right model. Understand the limitations of open-source vs state of the art and the implications - ensure there is a process for analytics, feedback and check-ins - have proper guard rails, and domain expertise to get the best out of the model - When you're evaluating make sure - Model Validation: regularly check ins to ensure right outputs. - Transparency: Understand why our AI models make the decisions they do. - Continuous Checks: kepp an eye on performance - Once you've built make sure - you set up routine checkins to measure performance metrics, success indicators and metrics. - replicate success factors, identify failure indicators and incorporate feedback loop - Plan for maintainence, upgrades, and new data - And calculate the RoI by - caclulating economic value generated - calculate costs incurred - measure efficiency The change from Analog to Digital wasn’t just about scale, efficiency and precision, It was about reimagining what was possible. Get books delivered at the tap of your finger, see and speak to family across the world. This is another internet moment. The world as we know it is changing. The advances in AI have made it part of our everyday, whether it is the algorithm that decides what you see on your timeline, or the smart watch that records how well you’ve slept. Consumer facing AI is a reality, and it’s now a necessity for businesses to accept, and adapt with it. I hope this session makes the transition to being an AI-Powered business easier for you.. For a RoI worksheet you can reach out to gen-ai@wednesday.is with your business use-case and we'll send you your own custom RoI worksheet! The email address is in the comments below.
Saurabh Suman, Mohammed Ali Chherawalla (Mac)
Hey folks, Today we'll look at Code Push for Flutter. Code Push allows you to release new versions of your application easily. You by-pass the need to submit a release to the app store or play store. No more 1 day review processes to get a bug fix out. You can now run experiments, fix bugs and add features instantly and hassle-free. Words don't do justice to this. Let me show you. This is a relases version of the application that I've installed from the playstore. This version of the application has a bug. Every time I click on search the app crushes. I now need to get a fix out immediately before I loose users. Sounds familiar? :) Let me quickly make the fix, and create a patch. And just like that all my users get the updated version and the app no longer crashes. Look at that, it just updated the live running app. This is just the tip of the iceberg. Stay tuned for a lot more
Shounak Mulay
Hi, this is what living stress -free looks like, a report card of accessibility compliance for your latest release. Look at the severity of these accessibility issues and determine if it's a blocker or something that you can live with. Get a live view of how your website would feel to someone with accessibility needs. Integrate accessibility audits to your pipeline and release features with confidence. Look how visually appealing this web application is with the right color contrasts, font sizes and accessible design. This isn't from the distant future in the galaxy far, far away. You no longer need to spend weeks on your manual QA for your accessibility needs. No more penalties and user drop -offs associated with it. Do your part in building an inclusive web, reach out and make a change.
Praveen Kumar D