Big Data in Finance (Numerical Methods) (2225.YR.000699.1)
General information
Type: |
OPT |
Curs: |
1 |
Period: |
S semester |
ECTS Credits: |
3 ECTS |
Teaching Staff:
Group |
Teacher |
Department |
Language |
Year 1 |
Irene Unceta Mendieta |
Operaciones, Innovación y Data Sciences |
ENG |
Prerequisites
Some programming background will be required for this course. To make sure everyone is on the same page, a set of pre-course introductory exercises will be published covering all the basic concepts in Python. It is students' responsibility to ensure that they are well prepared by checking their understanding of these concepts before coming to class.
Before the course starst, students should also familiarize themselves with the Jupyter nootebooks environment. For this purpose, they should install Anaconda on their personal computer.
Previous Knowledge
The course assumes some previous programming background in Python. This means that by the time the course starts you should already be familiar with the general workflow of a program, the primitive data types and most common composite data structures, including but not limited to floats, integers, booleans, strings, lists, dictionaries and sets. A general understanding of routines for control flow, iteration and conditional branching will also be assumed. This includes both definite and indefinite loops and if, elif and else statements. Finally, you should also be familiar with the function environment in Python. You can find tips and references to prepare for the course in the bibliography section below.
In addition to programming, knowledge of basic mathematics (calculus, linear algebra, statistics, optimization, etc.) will also be assumed. We won't deal into the hardcore math behind the different methods, yet a general understanding of basic mathematical tools will be helpful.
Finally, a course pre-requisite is motivation to work with Python and a curious mind to explore the things out there.
Workload distribution
This course has a study load of 3 ECTS, which is equivalent to 75 working hours. These hours will be approximately distributed as follows:
25% Lesson hours
25% Independent student work
25% Assignments
25% Final exam
Course Learning Objectives
This course introduces numerical methods and machine learning techniques for extracting information from financial datasets.
The course covers data processing, supervised learning techniques for classification, and complex networks modeling, as well as numerical asset pricing techniques.
All use-cases will be implemented in Python. An introduction into Python basics will be provided in class.
At the end of the course, students should:
1. Be familiar with the basic ideas behind Data Science and its application to the financial industry
2. Understand the different elements that come into play when devising a data science solution in a company
3. Show basic programming skills. Throughout the course participants will develop abilities to write and execute basic Python programmes oriented to solving different real-life problems.
4. Acquire a basic understanding of the infrastructure and the tools that are necessary in a company to deliver successful data science products and services to the market.
At the end of the course, students will be equipped with a toolbox to handle data-driven applications in finance and should be confident to apply for junior positions in the field of financial data analytics.
CONTENT
1. Python for Data Processing This lesson covers the native operations in Python, including conditional statements, loops and functions.
Content:
- Introduction to the course - Loops and conditionals - Functions
|
2. Introduction to Machine Learning This lesson introduces the notion of libraries. In particular, we will focus on Numpy and Pandas. The first is a well-know library for numerical analysis, while the second is the main tool for data management in Python.
Content:
- Python libraries - Numerical analysis using Numpy - Data preparation using Pandas |
3. Machine Learning for Credit Scoring This lesson discusses different applications of machine learning in practice through a use case. In particular, we will discuss a classification case. We will introduce different model types and sicuss how to compare them in practice.
Content:
- Decision trees, logistic regression, random forest - Prediction metrics - Cost effective metrics - Loan default prediction |
4. Natural Language Processing The lesson discusses several methods related to Natural Language Processing. In particular, it show how spacy library can be used to pre-process text and how specific machine learning models can be trained to conduct sentiment analysis and topic categorization.
Contents:
- Introduction to Spacy - Text preprocessing - Introduction to word embeddings |
5. Financial Networks This lesson introduces graphs and discusses how to implement them using igraph library. It also discusses applications of graph theory to study financial network dynamics in time.
Content:
- Introduction to graph theory - Introduction to python-igraph - Real-world network dynamics
|
6. Numerical Asset Pricing This lesson introduces numerical simulations for asset pricing. In particular, it discusses Monte Carlo methods for asset pricing using the Black-Scholes model.
Content:
- Introduction to Monte Carlo simulation - The Black-Scholes model - Numerical asset pricing using Python
|
Relation between Activities and Contents
|
1 |
2 |
3 |
4 |
5 |
6 |
Class preparation |
|
|
|
|
|
|
Jupyter notebooks |
|
|
|
|
|
|
Real-life industry applications |
|
|
|
|
|
|
Assignments |
|
|
|
|
|
|
Class participation |
|
|
|
|
|
|
Final Exam |
|
|
|
|
|
|
Methodology
6 weeks course with a mix of lectures and hands-on programming exercises.
Assessment criteria
The final grade of the course will be computed based on three components:
- Active contribution to the course (10%)
- Wekkly quiz (10%)
- Assignments (40%)
- Final exam (40%).
A minimum grade in the final exam of 4 out of 10 is required to pass the course. In case a retake exam is needed, the final course grade will be 100% determined by the retake exam mark.
Bibliography
The course won't follow any specific bibliography. Instead, references for the different topics will be introduced on a need basis. Readings for the different sessions will be published together with the rest of the materials.
For a beginner level introduction to Python, the are some of the book recommended by the Python community:
- Python Crach Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming - Eric Matthews
- Learning Python, 5th Edition - Mark Lutz
- A Bite of Python - C.H. Swaroop
An alternative to books are online courses. Most of them cover all the basic concepts. Just choose one and stick to it until the end:
- Introduction to Python programming - Coursera - University of Pennsylvania (https://www.coursera.org/learn/python-programming-intro)
- Learn Python 3 - CodeAcademy (https://www.codecademy.com/learn/learn-python-3)
- Introduction to Python - Datacamp (https://www.datacamp.com/courses/intro-to-python-for-data-science)
- Introduction to Python programmin - edX - Georgia Tech (https://www.edx.org/professional-certificate/introduction-to-python-programming)
Once you are familiarized with the basics, you can check the following for a more ad-hoc introduction of the use of Python for statistical learning and data analysis:
- Python for Data Analysis - O'Reilly Media
- Applied Predictive Modeling - Max Kuhn, Kjell Johnson
- The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirano, Jerome Friedman
Timetable and sections
Group |
Teacher |
Department |
Year 1 |
Irene Unceta Mendieta |
Operaciones, Innovación y Data Sciences |
Timetable Year 1
From 2023/5/4 to 2023/6/8:
Each Thursday from 8:00 to 9:30.
Each Thursday from 9:45 to 11:15.
Each Thursday from 11:45 to 13:15. (Except: 2023/5/18, 2023/5/25, 2023/6/1 and 2023/6/8)
From 2023/5/4 to 2023/5/11:
Each Thursday from 13:30 to 15:00.
Thursday 2023/6/15 from 8:00 to 11:15.