Categories
Uncategorized

Quantified Self

Pondering on our ongoing conversations about the “quantified self” and relation to wellness, I decided to focus on tracking a damaging habit of mine. I’ve been biting my nails fervently since junior year of high school, likely spurred by an influx in academic stress and standardized exams. With my nails rarely reaching past my nail bed, I’ve tried everything from no-bite nail polish, to dipping my fingers in salt, and, in a particularly despite ploy, rubbing jalapenos on my nail beds. For this project, I tracked my nail biting habits from October 17th to November 3rd. by documenting each occurrence with the date and time via a note-taking application on my cell phone.

My research question is as follows: which dates, times, and days of the week during the two week recording period have the highest number of bite occurrences and why? Internally, I also wanted to consider the limitations of “quantifying the self” without a third party health tracking application. It is human nature to want to present the absolute best version of yourself. With kicking a bad habit, it’s a natural reflex to want to fudge the data to make yourself feel as though you are improving, even if you are not. There were several instances when I considered not recording nail biting occurrences for this exact reason, which may present some issues related to internal biases and data quality if I were to continue working on this project in the future.

Visualization 1: Occurrences/Day

This first visualization shows the number of bite occurrences from October 17th- November 3rd by day. I chose a line chart to be able to clearly visualize the pattern of my nail biting habit from the start to the end of recording. Overall, the number of occurrences seem to trend downwards from October 17th to November 3rd, with the highest bite count occurring on October 21st. I also included annotations for particularly high and low bite occurrences to provide some situational context for the data. For instance, the records of high bite counts seemed to occur on days of particularly high academic stress, and the lower bite counts seemed to occur on days when I was attending social gatherings. I also noticed that I seemed to bite my nails more frequently on days when I slept poorly, namely October 24th, October 26th, and November 2nd.

Visualization 2: Occurrences/ Day of Week

My second visualization shows the number of bite occurrences summarized by the day of the week. I decided to use a bar chart to be able to clearly visualize the total number of bite counts by each individual day of the week. This visualization clearly shows that Monday and Wednesday had the highest number of bite counts, while Sunday had the lowest number of bite counts. The high bite count on Wednesdays may be explained by the fact that I have two classes Thursday evening, and usually reserve Wednesday as the day to complete assignments for both classes. The stress of procrastination may have led me to bite my nails at a higher frequency. On Sundays, I usually attend a group potluck with friends. The combination of social enjoyment and participating in activities that involve both hands and all ten fingers (cooking, serving, and making drinks) may explain the non-existent bite count.

Visualization 3: Occurrences/Hour

This final visualization shows each bite count by the hour of occurrence. Initially, I wanted to visualize the data in a clock-like manner, with each bite occurrence visualized at the time of occurrence in a radial bar chart. However, I am still learning about Tableau’s time functions and was unable to create the visualization the way I initially planned to. Instead, I chose a circle chart indicating each occurrence of nail biting by the time of occurrence. The highest counts of nail biting occurred in the late morning to early afternoon, or 11:00 am-2:00 pm, with the remaining hours exhibiting a relatively steady stream of occurrences.

Dashboard

Final Thoughts and Next Steps

Overall, I think this was a solid start to kicking a bad habit, and provided some interesting insights on the merits of the quantified self. One major detriment to me during this project was the overall unwillingness to document instances of an embarrassing habit. As I myself was recording this data without the help of a health-tracking application, I had to make a significant effort to hold myself responsible. If Apple decides to one day include a feature that automatically documents when you bite your nails, similar to the step count in the Health application, then I will be immediately signing up.

One adaptation that I would consider moving forward is including separate data from a health-tracking application to see if external factors are influencing my nail biting. For example, while creating the first visualization, I noticed that I slept poorly on some of the days with higher bite counts. A more impactful future project may include data collected via a sleep documenting application, combined with the self quantified bite count data, to see if there is indeed a relationship between sleep quality and nail biting.

Categories
Uncategorized

Labs 8+9 Dashboard and Story

The above dashboard shows a general overview of the popularity of baby names in the last 100 years, including gender ratio & prevalence and baby name popularity by gender. I created this dashboard by creating a dispersion plot for the ratio of female and male names as compared to the total population as well as two tree maps for each gender. The visualizations show that gendered names (such as “Mary” and “John”) have a higher popularity over time than more androgynous names (such as “Jaime” and “Adison”).

The above dashboard shows a more specified look at the popularity of baby names in the past 100 years. This dashboard allows the user to select a name of their choosing to see the breakdown of the name by gender, the number of occurrences of the name by year, the number of occurrences of the name over the last 100 years, as well as a comparison of the name to the three most popular male and female names. I created this dashboard by adding a “Search name” parameter to all the included visualizations.

Lastly, I added both dashboards to the story board for a more complete final visualizations. The user starts with the more generalized dashboard, then moves on to the more specified dashboard. I created this storyboard by dragging both dashboards into view, then creating captions to guide the user through the story.

Categories
Uncategorized

Lab 6 Visualization

This visualization shows the count of net migrations of populations in the United States of America, Brazil, Russian Federation, China, and India from 1951-2022. The data is originally sourced from the UN Population Division. I created this data visualization by downloading two data sets: “WPP2022_Demographic_Indicators_Medium” (in lieu of medium period indicator) and “WPP2022_Population1JanuaryByAge5GroupSex_Medium”. I created two inner data joins: a join between the two Loc Ids and a join between the two time indicators. I then added the Time dimension from the demographic indicators dataset to “Columns” and Location and SUM(Net Migrations) to “Rows”. I filtered the visualization to only include America and BRICS countries, and changed the dimension name from “Time(WPP2022 Demographic Indicators Medium.csv” to “Year”.

Categories
Uncategorized

Lab 7 Visualization

This visualization shows the outliers of both male and female baby names from 1900-2000. The data was originally sourced from the Social Security Name Database.

To create the above visualization, I imported the Social Security csv file into Tableau and began creating calculated fields for counts, percentages, fixed sum occurrences, and outliers for both male and female names. For some reason I had trouble viewing these calculated fields in Tableau after creation, so I worked mostly blindly. I then put the female and male outlier values into the “columns” part of Tableau along with year. To make sure only “True” values were showing up for both, I filtered the female and male outliers to only show “True” values in the visualization. I then put “Names” into the rows field for the final visualization.

Categories
Uncategorized

Lab 5 Visualization

Population data from Department of Economic and Social AffairsPopulation Division.

This visualization shows the change in global populations in thousands from 1950-2022. I used Excel to clean the dataset, and Tableau Public to transform the data into the above visualization.

Categories
Uncategorized

Project 1 Visualizations

Research Question:

While incredible strides have been made in the realm of climate justice and environmentalism, marginalized communities have long-been disproportionately impacted by air pollution and climate change. I originally became interested in this realm of research while taking environmental justice classes during my undergraduate years, and later while working for environmental justice and research non-profits in Washington, DC. Looking further into environmental justice issues in NYC, I found that, according to non-profit WEACT for Environmental Justice, children living in East Harlem are hospitalized for asthma at more than three times the city-wide rate. Verified air quality complaints from the Department of Environmental Protection, whereby DEP city officials observed a violation to the New York City Air Code and issued a code of violation, may be able to provide insight on the distribution of air pollution in New York City’s boroughs.

My research question for this project is as follows; what neighborhoods in New York City have the most DEP verified air quality complaints from 2017-2022? This question matters as it can provide the information needed to equitably serve the city’s residents with regards to air pollution and exposure.

Audience

My intended audience for this project are non-profit organizations like WEACT for Environmental Justice, New York City’s Environmental Justice Alliance, community leaders, as well as local governmental representatives.

Description of Data and Methodology

I first downloaded all air pollution complaints from the 311 Complaint Data Database from NYC Open Data from 2017-2022. I wanted to showcase a defined timeline to illustrate more recent air pollution trends in New York City. The initial downloaded database contained a multitude of complaints that were either never resolved or determined to not be a violation of New York City Air Code, thus misrepresenting the data. I then narrowed the dataset to only include air pollution complaints from 311 Complaint Data that are indicated as an “observed violation” from the DEP, made by NYC residents in all boroughs from 2017-2022. 

Major Frustrations and Workarounds

I created these visualizations in Tableau Public Desktop, and since I had to add a lot of additional connections to create the time lapse visualizations and work around tabulation issues, by the end I had too many rows to publish the entire worksheet to Tableau Public. As a workaround, I published each visualization separately and recreated the visualizations in Tableau Web Authoring. There were some limitations to working in Tableau Web Authoring, as I could not import custom maps or style the color palette as detailed as I would like.

Visualizations

Visualization 1: Time Lapse of Air Code Violations by Neighborhood

My first visualization shows the number of verified air code violations by neighborhood from 2017-2022. I decided to go with a choropleth map because I felt it was the best way to show the differences in air code violations from neighborhood to neighborhood. I used a distinct count function for the Unique Key variables to count the number of air code violations. To fully illustrate the changing pollution trends, I added a time lapse element to show the number of air pollution violations from 2017 to 2022. I pulled the zip code and neighborhood name data from the United Hospital Fund NYC Neighborhood Index website. I also designed a background map in Map Box and used the “Background Map” feature in Tableau to integrate the map. My design choice was to keep the background map dark and faded, with highly contrasting green lines to illustrate major roadways. To keep parts of the data disappearing during the time lapse, I added a Marks tab of just the outlines of the neighborhood by zip code. To do this, I had to add a separate <> connection of NYC zip codes to the original sheet. I decided to use white borders for the neighborhood boundaries to contrast against the dark map background and put a focus on the city. For the shading, I tried to match the colors to the colors of the map background; bright green for higher numbers of violations. I wanted the overall visualization to feel cohesive, but also have the NYC boundaries feel distinct against the map background. From the visualization, you can see that 2018 had a record high year of air code violations, with areas of Manhattan including Chelsea, Upper East Side, and Harlem containing a high number of air code violations. Neighborhoods in Staten Island seemed to have lower numbers of air code violations, with neighborhoods in the borough not having any air code violations recorded from 2020-2022.

Visualization 2: Time Lapse of Air Code Violations by Neighborhood

My second visualization shows the number of verified air code violations by borough from 2017-2022. To fully illustrate the changing pollution trends, I added a time lapse element to show the number of air pollution violations in each borough from 2017 to 2022. I pulled a publicly available map on Map Box using Tableau’s “Background Maps” feature. I decided to go with a simplistic black and white feature to add a bit of a retro spin to the visualization, whilst still being informative. During the initial time lapse, outlines of boroughs would disappear due to lack of violation count data for that particular year. To keep the outlines intact, I had to add a separate <> connection of NYC boroughs to the original sheet, and then add the boroughs as a separate Marks tab to apply a layer to the overall map. Similar to Visualization 1, I wanted to match the shading of the air code violation density to the color palate of the background map. This visualization shows that Manhattan has the highest number of air code violations, with the borough showing an all time high of 116 violations in 2018.

Visualization 3: Bar Chart of Air Code Violations By Borough

For this visualization, I wanted to compare total number of air code violations from 2017-2022 between the boroughs, instead of year by year like the previous two visualizations. To best compare the total number of air code violations in each borough side by side, I decided to create a bar chart with included labels showing the various air pollution violation numbers.

You can see clearly here that Staten Island has the lowest number of total air code violations (8 between 2017-2022), and Manhattan has the highest number of total air code violations (327 between 2017-2022).

Next Steps

In moving forward with this project, there are different dimensions to consider to present a more holistic presentation of the available data. In the air quality dataset, there was a category indicating the different type of pollution complaints including odor/fumes, smoke, vehicle idling, and construction and demolition. It would be interesting to include these metrics in future visualizations and analyses, and may provide additional context for environmental action. I was also unable to create a comparative graph for the total number of air code violations between 2017-2022 for each NYC neighborhood, as the total list of neighborhoods was too long for readable presentation of information. Moving forward, I would like to consider alternative visualizations for the neighborhood data such as radial circles.

Categories
Uncategorized

Lab 3 Visualization

This visualization dashboard shows the title of the dashboard, the description of the dashboard, the average restaurant rating per student, the restaurant recommendation contributors by name, a map of the restaurant recommendations, and the source cited. I created this dashboard by using the “Dashboard” feature in Tableau and following along Professor McSweeney’s instructions on Youtube, including the suggested design edits. In the embedded link on this page, you have to scroll down to see each visualization. During the next lab I will look into ways to show the entire dashboard via an embedded link.
Categories
Uncategorized

Lab 1+2 Visualizations

These visualizations contain data sourced from a Google survey on NYC restaurant reviews by DATA 73000 students. The survey contained questions regarding student name, restaurant name, restaurant location, yelp star rating, service rating, food rating, borough, food type, ambiance rating, and additional comments. I imported the data directly from Google Drive and created the below visualizations in Tableau. To clean the data, I used the SPLIT function in Tableau to separate delineated variables in the “Food Type” and “Name” category, and then corrected typos directly in Google Sheets.

This scatterplot graph shows the ambiance rating as compared to the yelp star rating, the service rating, and the food rating.
This map shows the count of restaurants in the survey by borough. To get here, I had to convert the geographic role of the “Borough Name” latitude and longitude attribute to “county” to generate the interactive polygon map.
This pie chart shows the restaurant recommendation contributors by first name.
This bar chart shows the average restaurant rating (1-5) by contributors. To get here, I created a simple calculation in Tableau adding the ambiance, food, and service rating and dividing by 3 to create an “Average Rating” attribute. Tableau automatically used a sum function instead of an average function, so after changing the function the values showed up as expected.
This line graph shows the responses by survey participants by day. To get here, I changed the timestamp from YEAR to DAY to accurately reflect when the students submitted their responses.
Categories
Uncategorized

Lab 0 Visualization

This visualization shows US population estimates by Country from 1985-2015. The data is originally sourced from the UN. I imported the data from Excel into Tableau and created a line chart comparing population (in billions) of each country as compared to the year.

Categories
Uncategorized

Hello world!

Welcome to CUNY Academic Commons. This is your first post. Edit or delete it, then start blogging!