Web Data Services in the Real World
Feb 14, 2010 5:00 AM PT
Today, we examine a fascinating use-case for Web data services (WDSs) with Deutsche Borse Group in Frankfurt, Germany. An innovative information service recently created there highlights how real-time content and data assembled from various online sources scattered across the Web provides a valuable analysis service.
The offering supports energy traders seeking to track global fluctuations and micro trends in oil and other related markets. But the need for real-time and precise data affects more than energy traders and financial professionals. More than ever, all sorts of businesses need to know what's going on in and what's being said about their respective markets, products and services.
In this series with Kapow Technologies, we've examined the need for WDS and ways that WDS and related tools can be used broadly to solve these problems. Now, we are going to learn the full story of how Deutsche Borse took Web data resources, and not only efficiently assembled knowledge from automated robots, cleansing tools, and analytics management, but from these capabilities they also created high value and focused WDS offerings onto itself.
To learn more about WDS as a business, please welcome our guests, Mario Schultz, director of Energy Facts at Deutsche Borse Group, and Stefan Andreasen, CTO at Kapow Technologies. The discussion is moderated by Dana Gardner, principal analyst at Interarbor Solutions.
Listen to the podcast (40:17 minutes).
Here are some excerpts:
Dana Gardner: It's interesting to me that we've moved beyond a level of static information to dynamic information and yet we still haven't taken full advantage of everything that's being developed and created across the Web.
But today's market turbulence demands that we do that. We have to move into an era where we can take quality data and provide agility into how we can consume and distribute it. We're dealing with more diverse data sources. That means we need to have completeness and we need to be comprehensive, in order to accomplish the business information challenges each business faces.
The need now is for flexible, agile, and mixed sourcing of services and data together. The content is often portable. That means it's ubiquitous across mobile devices and social networks in such a way that real-time analytics becomes extremely important.
The use of data as a business is now coming to the fore. We're beginning to see value, not from just the assimilation of data for use internally, but as more and more businesses are starting to take advantage of the data that they create and have access to. They share that with their partners, create ecosystems of value, and then even perhaps sell outright the information, as well as insights and analysis from that information.
Mario Schultz: Deutsche Borse is the German stock exchange in Frankfurt, Germany, and we offer all kinds of products and services around on-exchange trading and the adjacent processes. For several years now, I've been responsible for developing new products and services around information for on-exchange or off-exchange trading. This is why we've invented and developed the Energy Facts service.
We developed new products and services where we could transform our know-how and this real-time connection, aggregation and dissemination of data to other business lines. This is why we looked into the energy trading sector, mainly focused on the power trading here in Europe.
I began by working on the exchange of information that we have in our own systems. We were proceeding with our ideas of enhancing our services and designing new products and services. We were then looking into the Web and trying to get more information from the data that we gather from websites -- or somewhere else on the global Web -- and to integrate this with our own company's internal information.
Everything we do focuses on the real-time aspect. Our use of Web data services is always focusing on the real-time aspects of this.
At Deutsche Borse, we have something that's called "Xetra," our electronic trading system for cash products. We have Eurex, our derivative business line, which is worldwide, well-known, where you can trade other derivatives on that platform.
We have a main system called "CEF." It is our backbone IT solution for delivering data in real-time with milliseconds optimization. The data is mainly coming from our internal IT systems, like Xetra and Eurex, and we deliver this data to the outside world.
In addition, we calculate all the relevant indices, like the DAX, the flagship index for the German markets, with 30 instruments, and more than 2,000 -- or nearly 3,000 -- indices that are distributed over the well-known data vendors, for example, Bloomberg or Reuters. They are our main distribution networks, where we are delivering all our information.
By talking to well-known players in the market, we quickly recognized that we could build up a very powerful and fundamental data models. You have to collect all the relevant information to get an overview and to get an estimate about the price, in this case, where power could develop and in which direction it could develop.
Traders are looking into the fundamental factors that affect the price of the energy or the power that you trade, whether it's oil or whatever. That's how we started with power trading. You have the wind and other weather factors. You have temperature. You have the availability of power plants. So, you try to categorize and summarize these sectors. It's called the supply and the demand side regarding this energy trading.
The main issue and main task in the beginning was to collect the relevant data. Quite quickly, we were able to set up a big list of all relevant data sets or sources, especially for Germany and some adjacent countries. We came up with something around 70, 80 or even 100 different sources on the Web to grab information from. So, the main issue was how to collect and grab all this data in a manageable way into one database. That was the first step.
I wanted to have a responsible product manager for this project or for this new product. From the beginning, I had to have a good technology in place that would be able to handle all these kind of sources from the Web.
We recognized that there are so many different data formats that we had to grab. There are all these different providers of information in Germany and other European countries. They have their own Web sites. Some give the data in HTML format. Others use XLS, CSV, or even PDFs.
Kapow tells us how to get this information from these different sources in quite different formats. This is a manageable way, with a process-driven or graphical user interface (GUI) driven tool, that would use the effort, the personal, the manpower efforts to collect and grab the data.
Currently, we have 70 or 80 sources that we're grabbing. It's not only Web sites, but we have some third-party providers that are delivering information, for example, weather, temperature and things like that. We have providers giving data via FTP service, and we even use Kapow for grabbing data from these third-party players. As I said, it's a one-stop shopping solution to get everything via one channel.
The value-add was to grab all this data into one common data format, one database, so we would be able to deliver this data to the vendors via Web tool, Web terminal, or even our existing CEF data feeds. A lot of the players in the market are trying to collect this data by themselves, or even manually, to get an overview of where the power price would develop over the next day, hours, weeks, months, whatever.
Stefan Andreasen: This is an extremely impressive service that Mario just showed us here, and I'm sure, if you're dealing with buying and selling energy, this is a must for you to be sure you made the right decision.
If these data sources exist somewhere on the Web, we can actually grab them where they are. What you traditionally do with information gathering is that you call every company or every entity that has data and ask them, "Will you please provide the data in this or this format?" But with Kapow Web Data Services, you can just grab the data, wherever it is on the Web, and assemble this valuable data source much easier and much faster.
Businesses are relying more and more on data to make the right decision, and their focus is on quality, completeness, and agility. Let's be more practical here and ask how you actually get this data.
There is a term, data integration, which is about accessing the data and providing it in standard API (application programming interface), so that you can actually leverage the measure of business application.
Energy Facts is accessing this data at ... 70-80 different data sources, as Mario said, and providing it as a feed that depends on the volatility of the different data sources. Some of the data delivers every minute, and some deliver every four hours, etc., based on how quickly the data source changes. WDS is all about getting access to this data where it resides.
There are really two different kinds of data sources. One set of data sources is more like a real-time source data source. Let's say you go to a patent directory, and there are probably millions of patents. In that case you would use Kapow Data Server to wrap that data source into a service layer, and then you would be able to do real-time, as soon as you get real-time results back. So, that's real-time access, where you have vast amount of information.
The other scenario, and I think that's more what we see in the Energy Facts example here, is where you have a more limited data source, and you are actually trying to do a consolidation of the data into a database, and then you use that database to serve different customers or different applications.
With Kapow, you can actually go in and access the data, if you can see them on your browser. That's one thing. The other thing you need to do to make this data available to your business application is to transform and enrich the data, so that it actually matches the format that you want.
For example, on the Web site, it might have the date saying, "2 hours ago" or "3 minutes ago" and so on. That's really not useful. What you really want is a time stamp with the hour, the second, the minute, the months, the day, the year, so you can actually start comparing these. So data cleansing is an extremely important part of data extraction and access.
The last thing, of course, is serving the data in the format you need. That can be a database, if you're doing consolidation, or it can be as an API, if you are doing more of a federated access to data, and leaving the data where it is.
Actually, all styles exist, but there is a tendency for many companies to actually access the data where it is, rather than trying to consolidate it to a new place.
Go to our Web site and download a white paper from one of our customers, called "Fiserv." It's a large financial services company in the U.S. Fiserv has a lot of business partners, actually they have more than 300 banks in more than 10 countries as business partners. Because they're selling services, it's incredibly important for them to also monitor their customers to understand what's happening.
They had lot of people who logged into these 300 partner banks every day and grabbed some financial information, such as interest rates, etc., into an Excel spreadsheet, put it into a database, and then got it up on a dashboard.
The thing about this is that, first, you have a lot of human labor, which can cause human errors, and so on. You can only do it once a day, and it's a tedious process. So what they did is got Kapow in and automated the extraction of this data from all their business partners -- 300 banks in more than 10 countries.
They can now get that data in near real-time, so they don't have to wait for data. They don't have to go without on the weekend, because people are not working. They get that very business critical insights to the market and their partners instantly through our product.
Dana Gardner is president and principal analyst at Interarbor Solutions, which tracks trends, delivers forecasts and interprets the competitive landscape of enterprise applications and software infrastructure markets for clients. He also produces BriefingsDirect sponsored podcasts. Follow Dana Gardner on Twitter. Disclosure: Kapow Technologies sponsored this podcast.