Julio Sotelo
Julio Sotelo
  • About me
    • Bio
    • LinkedIn
    • Triathlete
  • NFL predictions
  • NBA predictions
  • Posts
MENU CLOSE back  

Social Network Analysis

For the following analysis I work with the “ingredients” network. This is a larger network, set of 1,106 ingredients mined from recipes on allrecipes.com. It is a one-mode projection of the ingredient-recipe bi-partite network. In other words, there is a link between two ingredients if they appear in the same recipe. The edges have a weight that reflects how often such pairings occur. There are more than 39,000 edges; reflecting a density around 3%.

In the analysis you would see how to work with R to create and export data to make visualizations with Gephi. I have included filtering, community detection, subgraph creation using degree as weight. In terms of the number of groups only two algorithms show similar numbers of clusters, betweenness and fastgreedy. Betweenness is focuse on finding conections between different communities. In contrast the fastgreedy algorithm that identify once modularity is not increasing. Nodes with high betweenness showld not contribut positevely to the fastgreedy algorithm and are the key to identify communities in the firts algorithm. This algorithms seem to be counterparts, providing similar results in terms of modularity and number of groups. In addition this algorithm have the highest modularity. Modularity among the algorithms similar, apparently the communities found have significan connection among the vertices.

The first visualizaiton belong to the community identify with the Fastgreedy algorithm. This community has two local networks or types of food. One that belongs to the ingredient I choose, which is cheese ravioli. This local network is related to Italian cousine. The second local network is food or ingredients derived from milk. There are two nodes within the Italian food that have strong edges to skim milk and egg substitute. This is the reason why this local networks are together in this community detected by Fastgreedy.

The second visualization is all about Italian food. There are no evident local networks whithin it. There are tow main things that came as a surprise to see. Firts the ingredient had only one edge to spauetti sauce. I would have expected to see a edges to types of cheese, spinach which are common in ravioli. The second thing is that even when I selected a node with low degree the communi detection delivers all vertex related and so I was able to see its network. The following visualizations are focus on the choosen ingredient in both communities.

For this analysis the Fastgreedy algorithm objective is to maximize modularity. Since there are nodes with strong ties between Italian cousine and egg sustitues and skim milk the algorithm was able to keep improving modularity without recognizing that it had gone to another local network. On the other hand, walktrap tends to stay within same comunities. The result is that the Italian cousine is identify as a single community while the other is a ingridient related netwok.

 

Here you may see the documented R code for the Social Net analysis

By JulioClassification R Social Network
PCA, SVD & AR in PythonMachine Learning in Python

Related

Amazon Reviews
Amazon reviews
October 15, 2023
Amazon-hadoop-fig9
Hadoop for Amazon product co-purchasing network
March 8, 2017
C4 copy
Predicting User Preferences, Creating Online P2P Lending Teams in Kiva
January 23, 2017
Lending-Club-Investor-Account-Review
Lending Club’s Portfolio Analysis
January 7, 2017
nfl
NFL Predictions
December 7, 2016
u2u-color-by-modularity
Amazon product co-purchasing network
December 2, 2016
food-atlas
Food Atlas
December 1, 2016
peppers-in-summer-1024×681
Multi-Label Learning by Exploiting Label Dependency
July 8, 2016
shelter-feature-image-1024×768
Shelter Animal Outcomes
June 1, 2016
Categories
  • Association Rule (2)
  • Canonical Correlation (1)
  • Classification (9)
  • Clustering (2)
  • Credit Risk (1)
  • Decision Trees (3)
  • Finance (3)
  • Gephi (2)
  • Hadoop (1)
  • LDA (3)
  • Linear Regression (1)
  • Machine Learning (5)
  • Monte Carlo (4)
  • PCA (7)
  • Python (12)
  • R (9)
  • Random Forest (3)
  • Social Network (4)
  • Sports (2)
  • SVD (2)
  • Thoughts (1)
  • TSA (4)
  • Uncategorized (1)
  • Web crawling (3)
Julio Sotelo

LinkedIn | Twitter