blog に戻る

2020年01月09日 Bruno Kurtic

Can You Tell Debug Data and BI Data Apart?

A few blogs posts ago I wrote about new BI for digital companies and in that blog I alluded that quite a bit of that BI is based on log data. I wanted to follow up on the topic of logs, why they exist and why they contain so much data that is relevant to BI. As I said in that post, logs are an artifact of software development and they are not premeditated, they are generated by developers almost exclusively for the purpose of debugging pre-production code. So how is it that logs are so valuable for BI? Let’s spend a little time examining how developers choose what to put in them and why and then lets take a look at a few logs of our own.

The nature of what developers are writing has a lot to do with what’s in the logs. Developers building customer-facing, revenue-generating applications have to debug them before they are ready for prime-time. Many of these applications are built using modern architectural paradigms such as microservices which are typically owned by a small team, relatively atomic in terms of their purpose, are organized around business capabilities.

As developers write their microservices to achieve that business purpose, they have test and debug them to ensure right outcomes. If a developer is say, writing a credit validation service for a retail application, successful debugging will require that their logs to contain bits of data required to validate that only the credit worthy users are allowed to continue. Hence, the log line is likely to contain user identity/address/etc., transaction amounts, time of transaction, transaction id, credit score, and likely a lot more related to technical execution of the code.

With this data, a developer will be able to tell how their code is working for their test subjects in pre-production. This same data can also be leveraged in production to provide quite a bit of information about business outcomes of the application. Looked at properly, an analyst can understand what mix of customers they receive, where customers are coming from, how much revenue app generates get per unit of time, at which times of the day, on which days, average size of transaction, etc etc.

But let me get practical and dissect a log line from our own logs, somewhat redacted for privacy and brevity. Here is one below that I particularly like and get insights from for our product team.

2019-12-29 00:57:20,913 -0800 INFO [hostId=▧▧▧▧▧] [module=STREAM] [localUserName=▧▧▧▧▧] [logger=▧▧▧▧▧.internals.▧▧▧▧▧$] [auth=User:░░░░░░░░░░:▒▒▒▒▒▒▒▒▒▒:███████████:false:5:UNKNOWN] [sessionId=▧▧▧▧▧] [callerModule=autoview] [remotemodule=stream] explainJsonPlan.ETT {"version" : 2.0, "customerId" : "▧▧▧▧▧", "sessionId" : "▧▧▧▧▧", "buildEngineDt" : 108, "parseQueryDt" : 18, "executionDt" : 193, "ett" : 122, "isInteractiveQuery" : false, "exitCode" : 0, "statusMessage" : "Finished successfully", "isAggregateQuery" : true, "isStreamScaledAggregate" : true, "isStreamScaled" : true, "callerSessionId" : "▧▧▧▧▧", "savedSearchIdOpt" : "None", "autoviewId" : "▧▧▧▧▧", "numViews" : 14, "paramCt" : 0, "rangeDt" : 59999, "processDt" : 59999, "kattaDt" : 1417, "mixDt" : 5, "indexRetrievalCriticalDt" : 8, "kattaCountDt" : 0, "kattaSearchDt" : 0, "kattaDetailsDt" : 0, "streamSortDt" : 0, "kattaNonAggregateQueryDt" : 0, "kattaNonAggregateSortDt" : 0, "kattaQueryDt" : 1417, "kattaTimingInfo" : {"totalTime" : 1502, "fetchTime" : 0, "countComputationTime" : 0, "hitsComputationTime" : 0, "docsRetrievalTime" : 146, ……., "searcherCreationTime" : 1, "prepQueueTime" : 0, "ioQueueTime" : 3, ……..}}}, "inputMessageCt" : 36676, "messageCt" : 15, "rawCt" : 55, "queryCt" : 15, "hitCt" : 36676, "cacheCt" : -1, "throttledSleepTime" : 0, "activeSignatureCt" : -1, "totalSignaturesCt" : -1, "initializationDt" : 0, "lookUpInitializationDt" : 1, …………..."indexBatchCt" : 1, "kattaClientException" : [], "streamProcessingDt" : 0, "operatorTime" : 0, "operatorRowCount" : [{"name" : "expression", "rowCount" : 25, "time" : 0}, {"name" : "saveView", "rowCount" : 25, "time" : 1}], "pauseDt" : 0, "gcTime" : 0, ………."numberRegexTimeout" : 0, "bloomfilterTimeTaken" : 3, "viewExtractionTimeTaken" : 11, "kattaSearcherTranslateDt" : 0, "executionStartTime" : 1577609838975, "executionEndTime" : 1577609840908, ………."scanAndRetrieveData" : {"scannedMessages" : 33004, "scannedBytes" : 21222615, "retrievedMessages" : 15, "retrievedBytes" : 9645}, "isCompareQuery" : false, "numOfShiftedQueries" : 0, "maxShiftInMilliseconds" : 0, "isBatchlessExecution" : false, "viewCountByType" : {"partitionCt" : 13, …….., "unknownCt" : 0}, "childQueriesSessionIds" : [], "analyticsTiersQueried" : ["Enhanced"]}

This line is generated by the developers of our search engine. It tells them who ran which search, what type of search, how long did the internals of the engine take to get things done, etc etc, all very useful when developers are working to write a reliable and fast search engine. But once this log line made it to production, many other teams latched onto it. Product managers measure adoption by type of search being run in order to determine where to focus new development efforts. Customer success team keeps track of customer health scores by keeping track of search performance and unique users running searches. Sales team monitors adoption during early days or customer lifecycle and proofs of concept. I will expand on further internal and customer examples in future blogs on this topic.

Unique benefit of this type of BI built on top of debug data is that in the modern world of agile development, this data changes as quickly as new code gets pushed into production but since this BI does not follow the rigid ETL data warehousing model, new intelligence can be gleaned from new bits of data developers add as they extend and debug new capabilities of the application. On the other hand, a unique challenge is that systems used to extract this type of business intelligence from debug data must be able to cope with that rate of change by enabling analysis of highly unstructured and often unknown bits of data. Done right, business analysis of debug data for enables business intelligence to evolve into continuous intelligence to facilitate leveraging new business signals to guide real-time decisions at the rate of change in digital business.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Navigate Kubernetes with Sumo Logic.

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Bruno Kurtic

Bruno Kurtic

Founding VP of Strategy and Solutions

Bruno leads strategy and solutions for Sumo Logic, pioneering machine-learning technology to address growing volumes of machine data across enterprise networks. Before Sumo Logic, he served as Vice President of Product Management for SIEM and log management products at SenSage. Before joining SenSage, Bruno developed and implemented growth strategies for large high-tech clients at the Boston Consulting Group (BCG). He spent six years at webMethods, where he was a Product Group Director for two product lines, started the west coast engineering team and played a key role in the acquisition of Active Software Inc. Bruno also served at Andersen Consulting’s Center for Strategic Technology in Palo Alto and founded a software company that developed handwriting and voice recognition software. Bruno holds an MBA from Massachusetts Institute of Technology (MIT) and B.A. in Quantitative Methods and Computer Science from University of St. Thomas, St.Paul, MN.

More posts by Bruno Kurtic.