Prediction of highly lucrative companies using annual statements: A Data Mining based approach
(Sprache: Englisch)
The intention of this study is to predict one year in advance whether a regarded firm will grow extraordinarily in the next year. This is crucial for private investors and fund managers who need to decide whether they should invest in a certain firm....
Voraussichtlich lieferbar in 3 Tag(en)
versandkostenfrei
Buch (Kartoniert)
46.30 €
- Lastschrift, Kreditkarte, Paypal, Rechnung
- Kostenlose Rücksendung
Produktdetails
Produktinformationen zu „Prediction of highly lucrative companies using annual statements: A Data Mining based approach “
Klappentext zu „Prediction of highly lucrative companies using annual statements: A Data Mining based approach “
The intention of this study is to predict one year in advance whether a regarded firm will grow extraordinarily in the next year. This is crucial for private investors and fund managers who need to decide whether they should invest in a certain firm. Companies like Apple and Amazon have shown that people who recognized the potential of such companies at the right time earned a lot of money.The applied prediction models can also be used by politicians to identify companies which are eligible for funding, because growing companies oftentimes hire many employees.
Since annual reports are often publically available for free, it is reasonable to take advantage of them for such a prediction. The prediction models are based on classification trees and forests because they have some very substantial advantages over other methods like neural networks, which are frequently used in literature. For instance, they do not have distributional assumptions, accept both quantitative and qualitative inputs, and are not sensitive with respect to outliers. Furthermore, they are easy to understand by humans and can deal with missing values, which is crucial for practical applications.
Lese-Probe zu „Prediction of highly lucrative companies using annual statements: A Data Mining based approach “
Extract:Chapter 3, The available dataset:
In this chapter, the used dataset of this paper is described. In this context, both the content and the structure of the dataset are illustrated. This chapter is part of the Data Understanding phase of CRISP-DM.
3.1, Description of the dataset:
The dataset originates from the company Bureau van Dijk Electronic Publishing GmbH (BvD). BvD obtains digitalised data about companies from ist information providers, combines this data, and provides this data to ist customers for analysis and research purposes. BvD also collects some of ist data by itself.
The dataset contains both companies which are listed at a stock exchange and companies which are not or no longer listed with an emphasis upon nonincorporated firms (Bureau van Dijk Electronic Publishing GmbH 2013). Furthermore, this dataset, which is called Amadeus , encompasses records from eastern and western Europe. In total, approximately three Million different companies are inside Amadeus. To enable comparisons of international firms, especially the annual report data was collected in a standardised way (Bureau van Dijk Electronic Publishing GmbH 2013).
Amadeus is stored in five Comma-separated-values files (csv-files). Ist values are enclosed in quotation marks, and consecutive values are separated by tabulator characters.
In this analysis, only two of the five csv-files are required: master file data (86 features) and finance data (72 features). Each of these files has a file size of approximately nine gigabytes. The master file data contains the names and addresses of the regarded companies. Additionally, it is mentioned in which industry they operate, which important trademarks they possess, and where most of their goods are produced. As it can be seen in Illustration 1, there are oftentimes more than one row for the same company. This seems to be the case if the corresponding feature is a descriptive feature and, therefore, has more than one value for this
... mehr
company at the same time (Bol 2004, 16). For instance, this is the case if a company has changed ist name several times and consequently has more than one former name. In these cases, only the first row is complete and all the other rows just contain the same BvD ID number , company name, and the additional feature characteristics. Such a file structure enables to avoid redundancy and to reduce the file size.
The finance dataset contains the actual annual reports. Every row represents exactly one report the date of which is saved in the column Account date . Other characteristic features are the gross profit, the number of employees and the costs of materials.
Another very important column is the already mentioned BvD ID number , which is unique for every company and enables to merge data from several csv-files. If, for instance, the user requires the industry code for a given annual report, he just has to go through the master file data and look for the first row which has the same BvD ID number as the annual report.
3.2, Data clean-up:
Like in most databases the data from Amadeus has to be manipulated and some datasets have to be excluded first before it can be analysed. This section presents such manipulations, which are carried out to enable data analysis. Further manipulations which are related to key figures are mentioned in chapter 2. Because the used data is distributed over two database tables, it has to be merged. The necessary steps are described in the appendix.
First of all, it has to be mentioned that only German companies are regarded because of the setting of the task which means that all other companies are excluded. Besides that, only annual reports from the years 2007, 2008, 2009, 2010 and 2011 are regarded. There are more recent reports in the dataset, too, but much less then for the mentioned five years. To ensure a certain representativeness of results, older data is accepted.
Furthermore, it is ensured that only those annual reports are consi
The finance dataset contains the actual annual reports. Every row represents exactly one report the date of which is saved in the column Account date . Other characteristic features are the gross profit, the number of employees and the costs of materials.
Another very important column is the already mentioned BvD ID number , which is unique for every company and enables to merge data from several csv-files. If, for instance, the user requires the industry code for a given annual report, he just has to go through the master file data and look for the first row which has the same BvD ID number as the annual report.
3.2, Data clean-up:
Like in most databases the data from Amadeus has to be manipulated and some datasets have to be excluded first before it can be analysed. This section presents such manipulations, which are carried out to enable data analysis. Further manipulations which are related to key figures are mentioned in chapter 2. Because the used data is distributed over two database tables, it has to be merged. The necessary steps are described in the appendix.
First of all, it has to be mentioned that only German companies are regarded because of the setting of the task which means that all other companies are excluded. Besides that, only annual reports from the years 2007, 2008, 2009, 2010 and 2011 are regarded. There are more recent reports in the dataset, too, but much less then for the mentioned five years. To ensure a certain representativeness of results, older data is accepted.
Furthermore, it is ensured that only those annual reports are consi
... weniger
Autoren-Porträt von Jurij Weinblat
Jurij Weinblat, M.Sc., was born in Charkov, Ukraine, in 1988 and moved to Germany with his family a few years later. He has studied Information Systems and finished both his undergraduate and graduate studies with distinction at the University of Duisburg-Essen. During his studies, the author spent one semester in Ireland. Moreover, he has given tutorials on different fields of Computer Science and Management Studies.Currently, he is doing a doctorate in applying Data Mining methods to annual statements.
Bibliographische Angaben
- Autor: Jurij Weinblat
- 2014, Erstauflage, 104 Seiten, 37 Abbildungen, Maße: 15,5 x 22 cm, Kartoniert (TB), Englisch
- Verlag: Anchor Academic Publishing
- ISBN-10: 3954893045
- ISBN-13: 9783954893041
Sprache:
Englisch
Kommentar zu "Prediction of highly lucrative companies using annual statements: A Data Mining based approach"
Schreiben Sie einen Kommentar zu "Prediction of highly lucrative companies using annual statements: A Data Mining based approach".
Kommentar verfassen