Introduction
In this project, we are going to analyze a dataset to determine how many layoffs companies experienced throughout the COVID-19 pandemic. The dataset contains companies’ names, types of industry, total layoffs, percentage of layoffs, and more.
With this data, I would like to answer the next questions:
· What companies had more layoffs during the pandemic?
· What countries had high numbers of layoffs?
· Is there a relation between the funds raised and the layoffs?
· What industry was the most affected by the pandemic?
Data Source
I got the dataset from Kaggle website, because i was taking a course from an analyst’s. He uploaded it for practice, so the information in this dataset may not be accurate, But, I have followed this analyst by months, and he use reliable databases. Even so, this works for my practice purposes, cleaning data in SQL and analyzing it in Power BI.
https://www.kaggle.com/datasets/swaptr/layoffs-2022
Uploading the data
The first step was to upload the raw data into SQL. The raw data was in Excel CSV format, and I had to insert NULL values into blank cells to avoid issues when uploading the data to SQL.
Once I modified those cells, I uploaded the data into MySQL, and then i verified that the data had been uploaded correctly.
The second step was to create a new table to ensure data security, since the original dataset should never be modified.
Then, I verified that the new table was created. The new table will be empty until we insert the data.
With these instructions, I did the data insertion for the new table.
Cleaning Data
Deleting repeated rows
The next step was to identify duplicate rows, so I used a window function “ROW_NUMBER”, with this we can add a numbering, but we need to add extra functions, because we need to do a numbering that increase if there are duplicates, to do that, I used PARTITION BY inside OVER(), and I added all columns names. Doing that, ROW_NUMBER will create the numbering column and numbering will change if all columns values are repeated.
I created a CTE to make queries easier and search which rows are repeated.
We can see with this query all repeated rows, so, I checked each result.
After I corroborated the 5 companies with repeated rows, I created another table that allowed me delete rows using the ROW_NUMBER column as reference.
With this query I deleted duplicate rows in the new table easily.
Then I corroborated that rows were deleted.
Standardized data
After I deleted repeated rows, I standardized the data where it was needed. Some companies names had spaces, so I used TRIM to correct it.
Then I searched all the industries on data, to check if those values are ok and they are not repeated with a different name. I found that Company Crypto had 2 different names “Crypto” and “Crypto Currency”.
I updated values to keep only one name, “Crypto”.
I verified the changes.
Then I verified country column to see if it had a format issue.
I Found that United States had two different values, one with “.” at the end and other without it.
I removed that “.” with the next query.
I updated the Table with the previous Query.
I verified that the changes worked.
The next step was to change the Date data type, because it was uploaded as a string type instead of a date type. But first, I formatted it as I preferred.
I verified the change.
I proceeded changing the Data type. To do that, I had to use ALTER TABLE.
At the left bottom, we can see the change.
Nulls and blanks
The next step was to verify Null and blank values and determined whether I should delete them or not. I checked industry columns first.
Once that I saw which companies had null or blank values, I queried by companies obtained from the chart above, because doing that, whether that company has two or more rows, I could see the industry type for rows with null and blank values.We can see that there is another row, and it has the industry type, so I was able to fix the blank value.
I did the same process for the other companies.
It has only one row, so I had to delete it because there is no way to know the industry type.
After I checked all the companies with null or blank values in industry column, I could update the chart one by one, but I chose to use another method.
First, I changed the blank values to NULL values, because it’s easier to work with one kind of value than 2.
Then, I used a JOIN function but with the same table, I stablished like t1 and t2 to differenced them, because t1 will be the table with the null values, and t2 will be the table without null values. With ON I’m saying that I will use the company column as a primary key, to make values match. With SET I’m telling that industry column from table t1(with null values) will be equal to industry column from table t2 (without null values), so, in this step is where the change happens. With WHERE I’m stablishing that table t1 will have only null values on industry column, and table t2 will have values.
It is a little confusing, but with easy words, I copied existent values to industry column where cells are NULL values.
Then, I verified that the values were correct now. And as we can see it worked.
The next step was to delete rows with NULL values on total_laidoff and percentage_laid_off, because with out those values, those rows are useless.
And finally, I deleted column row_num, that is the column that I created to do numbering, but now, it is useless.
Visualization
For this visualization, I didn't use Power Query a lot because I already cleaned data on SQL, I just hide some unnecessary rows for this analysis. It’s important to remember that this dataset it’s from 2020 to 2023, so it could change with more data from other years, or even obtaining data completely , because remember that I deleted some rows with missing values.
I will explain each chart selected:
Table with Company and Total Layoffs
I chose this visualization, because it makes easier to see how many layoffs each company had, and we can see which company had more layoffs or less layoffs. As we can see, Amazon was the company with more layoffs, followed by Google, Meta and Microsoft.
% of Layoffs by country
I used map visualization because I wanted to show what countries had more layoffs during the COVID-19 pandemic.
Layoff vs. Funds raised
Scatter plot visualization shows us the relation between layoffs and fund raised. As we can see, most of the companies with good funds raised base didn’t lay off their employees (There are exceptions like everything).
Total layoffs by industry
With the Bar visualization , we can see what industry had more layoffs, and as it shows us, the consumer industry is the one with more layoffs.
Answering questions
As I mentioned before, it is a data analysis for practice purposes, so it means that the results for this analysis may be not accurate because the data was obtained from an analyst’s website which uploads datasets for practice purposes. I used this dataset because I want to share with you my skills using analysis tools and my analytical skills.
Despite that, I think the dataset it’s ok and it only needs more records from other missing years and recover some values that are not available to be accurate, because we only have data from 2020 to 2023.
Let’s start answering the questions from the beginning:
· What companies had more layoffs during the pandemic?
Amazon, Google, Meta and Microsoft. At the beginning I got surprised when I saw that amazon had a lot off layoffs, because a company like it, had good profit during the pandemic, but there is a reason that and I investigated it.
Amazon had record profits in 2020–2021, and it hired too quickly. When demand normalized and global costs rose, it was forced to cut staff to scale and focus on more profitable businesses.
So, indeed amazon had a lot of layoffs, but that doesn’t mean that the company faced a difficult period.
· What countries had high numbers of layoffs?
North America and India. I’m not surprised about it because North American companies have a lot of international offices and plants around the world, so it means that they have more people working for them in other countries, making layoffs more visible.
About India I’m not sure about it, and I will need to investigate more about this country to understand why it had a lot of layoffs. But, as USA, could be a point of view case because the population number in this country is big, and they are in the top of countries with a lot of population in the world.
· Is there a relation between the funds raised and the layoffs?
Yes, we can see that most of the companies with a low fund raised had more layoffs than companies with a high fund raised (with some exceptions).
· What industry was the most affected by the pandemic?
Consumer industry. It makes sense because during the pandemic many people did not leave their homes as a preventive measure, and because of that, the companies lost sales, and as a result, they laid off employees.
Conclusion
We can conclude that the pandemic affected companies a lot, some more than others, and we found two things in common among those companies. The first is that they belong to industries where physical or close contact is important for sales or processes, and during the pandemic, this became almost impossible because people around the world took countermeasures to avoid contracting COVID.
The second observation is that companies with well-raised funds were better able to resist layoffs. This is expected, because companies with strong funds tend to have better profitability, making layoffs less likely. However, there are exceptions; in Amazon’s case, despite having substantial funds, they still had many layoffs. As explained before, there was a specific reason for this that might not apply to other companies.
With this situation, we could gather information about hiring dates and the number of people hired and compare it with layoffs dates and the number of people laid off. With this information, we can assess whether other companies experienced situations similar to Amazon.