Flight Matrix
Proposal
Introduction and data
The source of the Dataset : TidyTuesday GitHub
This data comes from Eurocontrol, an international organization for air traffic control management throughout Europe. Air Traffic Flow Management (ATFM) collects information on daily flights arriving and departing European airports. This data contains 14 columns and 7305 rows for flights from the years of 2017 to 2022. The columns contain information such as the date of the flight, airport designator, airport name, country of the airport, and the numbers of departures and arrivals to the airport. Our focus will be on Belgian airports and our main objective is to explore flight patterns, trends, and relationships across different airports in Belgium. There are no ethical concerns towards the use and analysis of this data.
Research Question
Research Question: How is the total number monthly flight departures ad arrivals at Brussels Airport comparatively to other major airports in Belgium, and what are the seasonal trends over the year?
Variables Involved
Quantative Variables:
FLT_DEP_1
- Contains total number of departing flightsFLT_ARR_1
- Contains total number of arriving flights
Categorical Variables:
APT_NAME
- Contains airport namesSTATE_NAME
- Contains name of the countryMONTH_MON
- Contains the month in which the records of departures and arrivals took place
Target population
The target population for this study consists of flight details for major Belgian airports, with comparing Brussels Airport to other airports in the dataset.
Importance of this research question
This research question is important because the Brussels Airport is one of the most busiest Airport in Europe. So, the findings could assist airport management to plan ahead of the seasonal changes which could improve the Airports Efficeincy.
Glimpse of data
YEAR | MONTH_NUM | MONTH_MON | FLT_DATE | APT_ICAO | APT_NAME | STATE_NAME | FLT_DEP_1 | FLT_ARR_1 | FLT_TOT_1 | FLT_DEP_IFR_2 | FLT_ARR_IFR_2 | FLT_TOT_IFR_2 | Pivot Label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016 | 1 | JAN | 2016-01-01T00:00:00Z | EBAW | Antwerp | Belgium | 4 | 3 | 7 | NaN | NaN | NaN | Antwerp (EBAW) |
1 | 2016 | 1 | JAN | 2016-01-01T00:00:00Z | EBBR | Brussels | Belgium | 174 | 171 | 345 | 174.0 | 161.0 | 335.0 | Brussels (EBBR) |
2 | 2016 | 1 | JAN | 2016-01-01T00:00:00Z | EBCI | Charleroi | Belgium | 45 | 47 | 92 | 45.0 | 45.0 | 90.0 | Charleroi (EBCI) |
3 | 2016 | 1 | JAN | 2016-01-01T00:00:00Z | EBLG | Liège | Belgium | 6 | 7 | 13 | NaN | NaN | NaN | Liège (EBLG) |
4 | 2016 | 1 | JAN | 2016-01-01T00:00:00Z | EBOS | Ostend-Bruges | Belgium | 7 | 7 | 14 | NaN | NaN | NaN | Ostend-Bruges (EBOS) |
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7305 entries, 0 to 7304
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YEAR 7305 non-null int64
1 MONTH_NUM 7305 non-null int64
2 MONTH_MON 7305 non-null object
3 FLT_DATE 7305 non-null object
4 APT_ICAO 7305 non-null object
5 APT_NAME 7305 non-null object
6 STATE_NAME 7305 non-null object
7 FLT_DEP_1 7305 non-null int64
8 FLT_ARR_1 7305 non-null int64
9 FLT_TOT_1 7305 non-null int64
10 FLT_DEP_IFR_2 2922 non-null float64
11 FLT_ARR_IFR_2 2922 non-null float64
12 FLT_TOT_IFR_2 2922 non-null float64
13 Pivot Label 7305 non-null object
dtypes: float64(3), int64(5), object(6)
memory usage: 799.1+ KB
Analysis plan
Data Exploration:
Initial Exploration:
Use
.head()
to display first few rows of the dataset.Use
.info()
to check the data types of each column and any null column values.Filter the data to only display data where the airport is located in Belgium.
Determine the unique cities in Belgium to set up for future comparisons.
Variable Preprocessing:
- Handling Missing Values:
- We filter out any data that appears to be missing information on number of flights for arrivals, departures, and totals.
- Data Type Conversion:
- Ensuring the columns are in the desired format which is required if not, convert them. Columns that represented date time have been appropriately converted.
- Selecting data that is pre-pandemic:
- The reason we had chose data from 2016-2019 is because we want to see the trends of flight distribution without the disturbances of significant events such as pandemic. However, it is important to note that our data does contain an outlier that is not as extreme as pandemic effects which will be discussed in the discussion section.
Statistical Analysis:
- Comparative Analysis by Airport:
APT_NAME
andMONTH_MON
to calculate the summary statistics for flight departures and arrivals.Create a time series graphs for total departures
FLT_DEPT_1
and arrivalsFLT_ARR_1
to identify monthly and yearly trends at Brussels Airport compared to other Airports.
- Seasonal Analysis:
- Create bar plots and line graph to visualize seasonal trends in departures and arrivals , comparing different airports and months.
Hypothesis Testing:
- Using Chi-Square Test or H - Test depending on the output.