Inspred by nirvacana.com according to his map we are going to create the online book to lear data-mining to become a data scientist. Below the map you can find the links to the articles describing each topics. If the topic was not described yet there will be strikethrough. The additional thing that was not mention by the author of the map, but we think that will be good to write something about it, will be mark with be bolded and blue colored.
Click the topic to learn the field:
- Fundamentals
- Matrices & Linear Algebra Fundamentals
Hash Functions, Binary Tree, O(n)Relational Algebra, DB BasicsInner, Outer, Cross, Theta JonCAP ThoremTabular DataEntropyData Frames & SeriesShardingOLAPMultidimensional Data ModelETLReporting Vs BI Vs AnalyticsJSON & XMLNoSQLRegexVendor LandscapeEnv Setup
StatisticsPick a Dataset (UCI Repo)Descriptive Statistics (mean, median, range, SD, Var)Exploratory Data AnalysisHistogramsPercentiles & OutliersProbability TheoryBayes ThoeryRandom VariablesCumul Dist Fn (CDF)Continous Distriuation (Normal, Poisson, Gaussian)SkewnessANOVAProb Dem Fn (PDF)Central Limit TheoremMonte Carlo MethodHypothesis Testingp-ValueChi2 TestEstimationConfid int (CI)MLEKernel Density EstimateRegressionCovarianceCorrelationPearson CoeffCausationLeast2 FitEuclidean Distance
- Econometrics
ProgrammingPython BasicsWorking in ExcelRapid MinerIBM SPSSR Setup & R StudioR BasicsExpressionsVariablesVectorsMatricesArraysFactorsListsData FramesReading CSV DataReading Raw DataSubsetting DataManipulate Data FramesFunctionsFactor AnalysisInstall Pkgs
Machine LearningWhat is ML?Numerical VarCategorical VarSupervised LearningUnsupervised LearningConcepts, Inputs & AttributesTraining & Test DataClassifierPredictionLiftOverfittingBias & VarianceTrees & ClassificationClassification RateDecission RateBoostingNaive Bayes ClassifiersK-Nearest NeighborRegressionLogistic RegressionRankingLinear Regresssion
PerceptronClusteringHierarchical ClusteringK-means Clustering
Neural NetworksSentiment AnalysisCollaborative Filtering
Text Mining / Natural Language ProcessingTaggingVocabulary MappingClassify TextUsing NLTKUsing WekaUsing MarhoutFeature ExtractionMarket Based AnalysisAssociation RulesSupport Vector MachinesTerm Frequance & WeightTerm Document MatrixUIMAText AnalysisNamed Entity RecognitionCorpus
Data VisualizationTableuIBM ManyEyesInfoVisD3.jsDecission TreeTimelineSurvay PlotSpatial ChartsLine Charts (Bi)Scatter Plot (Bi)Tree & Tree MapHistogram & Pie (Uni)gglplot2Uni, Bi & Multivariate VizData Exploration in R (Hist, Boxplot, etc)
Big DataMap Reduce FundamentalsHadoop ComponentsHDFSData Replication PrinciplesSetup Hadoop (IBM/Cloudera/HortonWorks)Name & Data NodesJob & Trash TrackerMIR ProgrammingSqoop: Leading Data in HDFSFlume, Scribe: For Unstruct DataSQL with PigDWH with HiveScribe, Chukwa For WeblogUsing MahoutZookeeper AvroStorm: Hadoop RealtimeRhadoop, RHIPErmrCassandraMongoDB, Neo4j
Data IngestionSummary of Data FormatsData DiscoveryData Source & AcquisitionData IntegrationData FusionTransformation & EnrichmentData SurvayGoogle OpenRefineHow much Data?Using ETL
Data MungingPrincipal Component AnalysisStratified SamplingSamplingDenoisingFeature ExtractionBinning Sparse ValuesUnbiased EstimatorsHandling Missing ValuesData ScrubbingNormalizationDimensionality & Numerosity Reduction
ToolboxMS Excel and Analysis ToolpackJava, PythonR, R-Studio, RattleWeka, Knie, RapidMinerHadoop Dist of ChoiceSpark, StormFlume, Scibe, ChukwaNutch, Talend, ScraperwikiWebscraper, Flume, Sqooptm, RWeka, NTLKRHIPED3.js, ggplot2, ShinyIBM LanguagewareCassandra, Mongo DB
Ta strona jeszcze jeszcze nie posiada treści w języku polskim
Last comments