Technical elements
MyDataBall's two innovations are on the one hand the calculation techniques oftrees et d'autre part de la visualisation multidimensionnelle.
- 1) The trees : it is the mathematical tool that allows to create a hierarchy between variables to explain an indicator and learn. The tree solution is not unique and the number of possible trees is huge (n, n being the number of variables involved). The more variables you put in explaining and predicting, the greater the number of possible trees.
Thus, the principle of the MyDataBall process is to compute a reduced set of trees that has a high explanatory and predictive weight. We are talking about tree forest optimized.
Our solution seeks the shortest thermodynamic path that minimizes the information needed to describe and a subset of data. Thus, the subset of the data is seen as an energy system and the trees as the entropic result (level of information approaching the deterministic response at best). The simulated annealing optimization process allows you to inject a high temperature at the start and then cool the tree populations and crystallize them in a set with maximum entropy.
By taking the example of a database having thousands of lines and having exactly the same data (same information per line), then the energy is weak and the result is a minimum tree of a node that fully explains the content in one sentence: "there are M billion lines that have the same information". We thus go from tera bytes of data to a few bytes of data. Data compression is thus optimal.
In real cases, the energies depend on the field of investigation and the content of the bases and can be very high. In this case, the trees will be called "provided". The result of trees is guided by the questions you ask about the data and the energy of the database. The tree technique answers the challenges of understanding why an indicator evolves and not just knowing where the indicator evolves. - 2) The spherical visualization : the compression of the data by tree makes it possible to have at the disposal a very large number of data (answer to the BigData and the constraint of "in memory"), but also to visualize the multidimensional information: visualize rules with more than 2 dimensions (up to 10 dimensions). The rotation of the sher makes it possible to increase of & Pi; screen surface a normal screen and so allows to visit the depths of the trees.
The classic dataviz tools have a depth of 2 which gives the possibility to find only correlations. At more than 2 dimensions, the causalities appear. MyDataBall and allows users to validate collaboratively effective multidimensional decision rules ... and turn them into causal rules ... the holy grail of knowledge discovery.
MyDataBall optimizes the BI tools market (Qlik Blackboard PowerBI, SpotFire ...) by allowing users to select and detect MyDataBall good dashboards that account.
in the generation of machine learning tools considered "black box", MyDataBall is part of the ability to make visible and understand what the neural network solutions detect, and giving users the opportunity to reappropriate the results.
These two techniques together make the relevance of the MyDataBall approach. It makes it possible to reproduce gigantic data sets on a normal computer and to detect knowledge for users.
Excerpt Bibliography
- 1. Chauvin S., November 2012, November 2013, April 2014, Data Visualization, Research Group on Complexity
- 2. Chauvin S, 1994. Decision Theory Evaluation For Fusion, CNRS Data Fusion Meeting.
- 3. Chauvin S. and Molendi P. "Tools for better management of the company", review Echanges, May 2012, n ° 298.
- 4. Desarachy B., Decision Fusion, IEEE Computer Society Press, 1994. Li>
- 5. S. Chauvin, L. Dunis, L. Jáñez, and J. Laws, System of Information for Data Mining Problems
- 6. FX Volatility Forecasts and the Informational Content of Market Data for Volatility, 2000, C. Dunis L., Jason Laws, and Stéphane Chauvin, London Financial Review of Forecasting Model