Flight Matrix

Proposal

Project description
Author
Affiliation

Lean Mean Learning Machines

School of Information, University of Arizona

import numpy as np
import pandas as pd
import seaborn as sns

Introduction and data

flights_dt = pd.read_csv('data/flights.csv')

The source of the Dataset : TidyTuesday GitHub

This data comes from Eurocontrol, an international organization for air traffic control management throughout Europe. Air Traffic Flow Management (ATFM) collects information on daily flights arriving and departing European airports. This data contains 14 columns and 7305 rows for flights from the years of 2017 to 2022. The columns contain information such as the date of the flight, airport designator, airport name, country of the airport, and the numbers of departures and arrivals to the airport. Our focus will be on Belgian airports and our main objective is to explore flight patterns, trends, and relationships across different airports in Belgium. There are no ethical concerns towards the use and analysis of this data.

Research Question

Research Question: How is the total number monthly flight departures ad arrivals at Brussels Airport comparatively to other major airports in Belgium, and what are the seasonal trends over the year?

Variables Involved

Quantative Variables:

  • FLT_DEP_1 - Contains total number of departing flights

  • FLT_ARR_1 - Contains total number of arriving flights

Categorical Variables:

  • APT_NAME - Contains airport names

  • STATE_NAME - Contains name of the country

  • MONTH_MON - Contains the month in which the records of departures and arrivals took place

Target population

The target population for this study consists of flight details for major Belgian airports, with comparing Brussels Airport to other airports in the dataset.

Importance of this research question

This research question is important because the Brussels Airport is one of the most busiest Airport in Europe. So, the findings could assist airport management to plan ahead of the seasonal changes which could improve the Airports Efficeincy.

Glimpse of data

flights_dt.head()
YEAR MONTH_NUM MONTH_MON FLT_DATE APT_ICAO APT_NAME STATE_NAME FLT_DEP_1 FLT_ARR_1 FLT_TOT_1 FLT_DEP_IFR_2 FLT_ARR_IFR_2 FLT_TOT_IFR_2 Pivot Label
0 2016 1 JAN 2016-01-01T00:00:00Z EBAW Antwerp Belgium 4 3 7 NaN NaN NaN Antwerp (EBAW)
1 2016 1 JAN 2016-01-01T00:00:00Z EBBR Brussels Belgium 174 171 345 174.0 161.0 335.0 Brussels (EBBR)
2 2016 1 JAN 2016-01-01T00:00:00Z EBCI Charleroi Belgium 45 47 92 45.0 45.0 90.0 Charleroi (EBCI)
3 2016 1 JAN 2016-01-01T00:00:00Z EBLG Liège Belgium 6 7 13 NaN NaN NaN Liège (EBLG)
4 2016 1 JAN 2016-01-01T00:00:00Z EBOS Ostend-Bruges Belgium 7 7 14 NaN NaN NaN Ostend-Bruges (EBOS)
flights_dt.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7305 entries, 0 to 7304
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   YEAR           7305 non-null   int64  
 1   MONTH_NUM      7305 non-null   int64  
 2   MONTH_MON      7305 non-null   object 
 3   FLT_DATE       7305 non-null   object 
 4   APT_ICAO       7305 non-null   object 
 5   APT_NAME       7305 non-null   object 
 6   STATE_NAME     7305 non-null   object 
 7   FLT_DEP_1      7305 non-null   int64  
 8   FLT_ARR_1      7305 non-null   int64  
 9   FLT_TOT_1      7305 non-null   int64  
 10  FLT_DEP_IFR_2  2922 non-null   float64
 11  FLT_ARR_IFR_2  2922 non-null   float64
 12  FLT_TOT_IFR_2  2922 non-null   float64
 13  Pivot Label    7305 non-null   object 
dtypes: float64(3), int64(5), object(6)
memory usage: 799.1+ KB

Analysis plan

Data Exploration:

  1. Initial Exploration:

    • Use .head() to display first few rows of the dataset.

    • Use .info() to check the data types of each column and any null column values.

    • Filter the data to only display data where the airport is located in Belgium.

    • Determine the unique cities in Belgium to set up for future comparisons.

Variable Preprocessing:

  1. Handling Missing Values:
    • We filter out any data that appears to be missing information on number of flights for arrivals, departures, and totals.
  2. Data Type Conversion:
    • Ensuring the columns are in the desired format which is required if not, convert them. Columns that represented date time have been appropriately converted.
  3. Selecting data that is pre-pandemic:
    • The reason we had chose data from 2016-2019 is because we want to see the trends of flight distribution without the disturbances of significant events such as pandemic. However, it is important to note that our data does contain an outlier that is not as extreme as pandemic effects which will be discussed in the discussion section.

Statistical Analysis:

  1. Comparative Analysis by Airport:
    • APT_NAME and MONTH_MON to calculate the summary statistics for flight departures and arrivals.

    • Create a time series graphs for total departures FLT_DEPT_1 and arrivals FLT_ARR_1 to identify monthly and yearly trends at Brussels Airport compared to other Airports.

  2. Seasonal Analysis:
    • Create bar plots and line graph to visualize seasonal trends in departures and arrivals , comparing different airports and months.

Hypothesis Testing:

  • Using Chi-Square Test or H - Test depending on the output.