Data scientists do things like: data munging, analysis, and writing implementations of machine learning algorithms for production. But Data Scientists are defined by experiments. Which distinguishes them from a data analyst or a machine learning engineer.
So, Data Scientists have to be great at doing experiments.
You can not give a guarantee of being faster and clever as an efficient Data Scientist. You can become an efficient data scientist by doing more and more experiments and getting better and faster by doing experiments.
There are some of the things which many Data Scientists lack which are:-
The best way to measure the uncertainty statistically is Standard Error in the mean. This means the results may have some means of something like mean of f1 scores in 10 folds in Cross-Validation or mean precision of 10 rankings in 1000 different arguments.
You don’t need to the statistical significance of tests but you must know how uncertain your results are. The Standard Error in mean helps you for this as it finds uncertainty in tests you have separated.
Big Data is Slow
Big data is slow. So use small data as you don’t want to be slow. Most of the time big data is no needed but if it’s necessary to check it twice that it’s really that much important.
You want the dataset bigger but not to have uncertainty in the results so you can differentiate the result. If you cant than big data is just waste of time. The main cause of slow experiments is Big Data don’t do it.
No use of Big Data Tools
As you are having small data as the earlier point being faster so don’t use any Big Data Tool like Spark it makes terribly slow performance instead use Pandas or Scikit-Learn.
Use of good IDE
IDE stands for Integrated Development Environment. Use decent IDE like pycharm and learn how to use it in a proper manner. Most useful features of the pycharm are as follows:-
- Viewing Parameters of class and methods
- Quick Search
If the experiment is too slow after reducing the dataset size than you get benefits of Code Optimisation. Balancing all the running experiments with code optimization while running the code.
Data Scientists must know the basics of Code Optimisation. One of the code optimization basic is using Profilers. Profilers will tell you which of the bits are slow and you can change those bits until they increase the speed or performance. After that doing the process as a loop and finding all the slower bits and correcting them.