Wrangling Data : HW2

Aminata Gadio
3 min readFeb 4, 2021

--

  1. 1 takeaway from each of these articles. Something that surprised you:
  • This is an article I never knew I needed until today. Being that I am interested in pursuing a career path in public health and data science, this article taught me so much of the importance of data through real life examples. Specifically, I learned that data can often be messy and needs to be cleaned up before further analysis. In my current job working at the New School’s dean’s office, I work with hundreds of data sets, organizing them into spreadsheets and folders. Oftentimes, it takes hours or even days to organize the data and can sometimes be overwhelming. This article taught me that having a clean data set can help one understand data easier and can make collecting data faster and less tedious. However, one thing that surprised me the most was the judge and crime example. It states that judges use data to help rule out a verdict though data dictionaries, analyzed numbers, and rates of punishments. It states, “But for every judge, about 1–2 percent of the cases showed no prison time… So the chart showing the sentencing patterns for each judge included a tiny amount of cases as “No punishment.” Though it was surprising to see that data can be used to rule out verdicts in the criminal justice system, I wonder if the data can ever be misused or incorrect. Especially with the criminal justice system (with majority of minorities and African Americans in prisons and jails), many rulings can favor one race over the other. How can we use data to fix this issue?

2)1 takeaway from each of these articles. Something that surprised you

  • As stated in the previous question, I work at The New School’s dean’s office tracking thousands of data sets on spreadsheets and different softwares. Despite this, there’s always something new to learn on spreadsheets. Specifically, in this article, I learned how to clean spreadsheets using variables and formatting. Usually when working with data in the Dean’s office, I often struggle with formatting. Most of the time, my data sets are messy with different fonts and sheets, making it harder to understand the data set. This article taught me how to organize my data using colors; using abbreviations instead of nicknames; and using dashes or underscores for headers to avoid formatting issues. Moreover, as a visual learner, I greatly benefited from the images as I was able to practice it on my own spreadsheet. One thing that surprised me was the different ways of formatting dates on spreadsheets. It was interesting to see how each date can be abbreviated differently, but still meant the same thing. I also wonder if spreadsheets allow for data to be organized in alphabetical order. It would be interesting to see that given the thousands of datasets that can be in one spreadsheet.

3)

For this assignment, I examined a dataset that hit close to home (literally). Being that I am from Senegal, I wanted to find a data set that is related to my country’s social and economic development. For this reason, I found a data set showing the number of infant mortality rates in Senegal from 2010–2020. I found that death rates for infants are decreasing throughout the years. It was interesting to see how the other data sets are closely related to the decreasing of infant death rates as health care enrollments are increasing as well as teen pregnancies (in Senegal). This is a factor of the downward regression of infant death rates as there is an increase awareness of preventing teen pregnancies so that women and girls can go to school. This is also a leading reason for why the data for women employment between the ages of 14–21, are also increasing. I love how open ended data can be as you can discover so many things from one data set. Similar to my analysis of the graph above.

Excel:

--

--

No responses yet