Last week I completed both the Desktop II: Intermediate and Desktop III: Advanced e-learning training on Tableau. I am currently taking full advantage of their 90-day free access to their online training courses. They have learning paths to choose from. So far I've taken the Designer, Analyst and Data Scientist paths. The visual analytics training that was part of all three paths was unfortunately only offered as classroom training and costs $1400.
So with all the learning I've been doing, I wanted to play with a data set on first names. I am part of a Mom group on Facebook and there is always someone asking for name recommendations for their upcoming baby. The data set consists of the relative frequency of given names in the population of U.S births where the individual has a social security number from the year 1910 to March 3, 2019.
Some questions I wanted answered were:
- What is the most popular male/female name of all time?
- What is the most popular male/female name per year?
- What are the top 10 popular names in each state?
- When did the name first appear in the United States?
- What is the number of occurrences of each name per year?
After combining all the text files (data was segregated per state) in Jupyter Lab using Python Pandas, I had a dataframe of 6028151 rows x 5 columns. I then exported this into a csv file which had a size of about 155mb.
I still had one day left in my Tableau Desktop trial so I was excited to start creating some visualizations. I connected to the csv file and was ready to create a simple crosstab. Unfortunately, my Tableau fun was put to a halt. It was running the query for a good 4 minutes before I had to cancel it. I tried it several times but I encountered the same problem. Was my file too big?? It shouldn't be considering it's only 6 million rows. Or am I wrong to assume that Tableau can handle large data sets. What was I doing wrong? I now have new questions unrelated to my name project and will have to play around Tableau Public. Hopefully I find my answers. For now, I revert back to Python to answer my initial inquiries.