First, the models for the Sociology department will be estimated. Afterwards, these models will be estimated for the Data Science department.
#dependent
snet <- sienaDependent(soc_net_array)
### Step 1: define data
#gender
gender <- as.numeric(socdef_df$gender=="female")
gender <- coCovar(gender)
#Kardashian Index
ki <- as.numeric(socdef_df$ki)
ki <- coCovar(ki)
#Ethnicity
dutch <- as.numeric(socdef_df$dutch)
dutch <- coCovar(dutch)
#Twitter dummy as control variable
twitter_dum <- (socdef_df$twitter_dum)
twitter_dum <- coCovar(twitter_dum)
#Twitter followercount
#followers <- as.numeric(soc_twitterinfo$twfollowercounts)
#followers <- coCovar(followers)
#year first pub
# soc_staff_cit %>% group_by(gs_id) %>%
# mutate(pub_first = min(year)) %>%
# select(c("gs_id", "pub_first")) %>%
# distinct(gs_id, pub_first, .keep_all = TRUE) -> firstpub_df
#
# socdef_df <- left_join(socdef_df, firstpub_df)
#
# #if no publication yet, set pub_first op 2023
# socdef_df %>% mutate(pub_first = replace_na(pub_first, 2023)) -> socdef_df
pub_first <- coCovar(socdef_df$pub_first)
mydata <- sienaDataCreate(snet, gender, ki, dutch, pub_first, twitter_dum)
### Step 2: create effects structure
myeffs <- getEffects(mydata)
effectsDocumentation(myeffs)
### Step 3: get initial description
print01Report(mydata, modelname = "/Users/anuschka/Documents/labjournal/results/soc_report")
The report above shows - next to the descriptives of the variables included - the Jaccard Index. This is an index for stability and change in networks, as it counts the number of stable ties, new ties and dissolved ties when comparing waves. In the first change period (wave 1 and 2), the Jaccard Index of this model is 0.359. For the subsequent comparison of waves, the Jaccard Index is 0.327. When above 0.3, the values are good to estimate (Ripley et al. 2022)
### Step4: specify model with structural effects
myeffs <- includeEffects(myeffs, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffs <- includeEffects(myeffs, transTriads)
### Step5 estimate
myAlgorithm <- sienaAlgorithmCreate(projname = "soc_report")
(ans <- siena07(myAlgorithm, data = mydata, effects = myeffs))
# (the outer parentheses lead to printing the obtained result on the screen) if necessary, estimate
# further
(ans <- siena07(myAlgorithm, data = mydata, effects = myeffs, prevAns = ans))
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.6043 ( 0.4525 )
#> 0.2 Rate parameter period 2 2.5903 ( 0.7431 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.6873 ( 0.3472 ) -0.0180
#> 2. eval transitive triads 0.5994 ( 0.2402 ) -0.0189
#> 3. eval degree act+pop 0.0895 ( 0.0360 ) -0.0151
#>
#> Overall maximum convergence ratio: 0.0244
#>
#>
#> Total of 2367 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.121 0.009 -0.011
#> 0.113 0.058 -0.004
#> -0.848 -0.516 0.001
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 78.824 57.771 2396.000
#> 57.868 72.891 2075.721
#> 862.977 756.690 29859.828
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 106.212 87.995 3395.242
#> 0.808 111.756 3269.719
#> 0.948 0.890 120891.596
The first model with only the structural network effects, shows that the density effect is strongly negative (b=-2.687; se=0.347) and significant. As this is the effect of the observed ties as part of all possible ties and a degree of 0 would equal the fact that 50% of possible ties would be observed, it is logic that this number is below zero. Furthermore, a significant and positive effect (b=0.599; se=0.240) of transitive triads can be observed, meaning that scientists of Sociology prefer a transitive tie rather than no transitive tie. Lastly, the activity and popularity effect (b=0.090; se=0.036) is also significant, signalling that scientists at this department prefer to co-publish with other staff members that have already co-published.
myeffs1a <- getEffects(mydata)
myeffs1a <- includeEffects(myeffs1a, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffs1a <- includeEffects(myeffs1a, transTriads)
myeffs1a <- includeEffects(myeffs1a, absDiffX, interaction1 = "ki")
(ans1a <- siena07(myAlgorithm, data = mydata, effects = myeffs1a, prevAns = ans))
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.5909 ( 0.4567 )
#> 0.2 Rate parameter period 2 2.5832 ( 0.7844 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.6353 ( 0.3646 ) -0.0140
#> 2. eval transitive triads 0.5990 ( 0.2475 ) -0.0236
#> 3. eval degree act+pop 0.0875 ( 0.0353 ) -0.0133
#> 4. eval ki abs. difference -0.0231 ( 0.0623 ) -0.0289
#>
#> Overall maximum convergence ratio: 0.0417
#>
#>
#> Total of 2190 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.133 0.006 -0.011 -0.008
#> 0.064 0.061 -0.004 0.000
#> -0.823 -0.487 0.001 0.000
#> -0.346 0.031 0.120 0.004
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 76.690 57.157 2375.885 161.922
#> 55.405 67.917 1970.718 96.302
#> 847.409 744.072 29779.093 1641.010
#> 92.583 55.645 2560.508 715.250
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 105.301 89.206 3441.255 215.354
#> 0.838 107.707 3272.897 151.211
#> 0.954 0.897 123676.216 6491.007
#> 0.549 0.381 0.482 1463.767
When adding the k-index to the model, the structural network effects remain significant. The effect of the k-index is negative (b=-0.231; se=0.062) and significant, meaning that scientists at the Sociology department prefer to co-publish with someone with a lower k-index than themselves. However, as this effect shows the absolute difference, it can also be interpreted as a rather small effect, indicating a preference for similarity regarding the k-index. This could thus support the idea of homophily with regard to the k-index.
myeffs1 <- getEffects(mydata)
myeffs1 <- includeEffects(myeffs1, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffs1 <- includeEffects(myeffs1, transTriads)
myeffs1 <- includeEffects(myeffs1, absDiffX, interaction1 = "ki")
myeffs1 <- includeEffects(myeffs1, sameX, interaction1 = "dutch")
myeffs1 <- includeEffects(myeffs1, absDiffX, interaction1 = "pub_first")
myeffs1 <- includeEffects(myeffs1, sameX, interaction1 = "twitter_dum")
myeffs1 <- includeEffects(myeffs1, sameX, interaction1 = "gender")
(ans1 <- siena07(myAlgorithm, data = mydata, effects = myeffs1, prevAns = ans))
#Save the last model since it has the lowest maximum convergence ratio.
save(ans1, file="/Users/anuschka/Documents/labjournal/results/soc_model_cov1")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.6244 ( 0.4465 )
#> 0.2 Rate parameter period 2 2.6565 ( 0.7562 )
#>
#> Other parameters:
#> 1. eval degree (density) -3.6601 ( 0.7510 ) 0.0645
#> 2. eval transitive triads 0.6179 ( 0.2764 ) 0.0374
#> 3. eval degree act+pop 0.1006 ( 0.0465 ) 0.0557
#> 4. eval same gender 0.0002 ( 0.2662 ) 0.0676
#> 5. eval ki abs. difference -0.0173 ( 0.0646 ) 0.0489
#> 6. eval same dutch 0.2773 ( 0.3659 ) 0.0525
#> 7. eval pub_first abs. difference 0.0068 ( 0.0191 ) 0.0484
#> 8. eval same twitter_dum 0.9293 ( 0.2940 ) 0.0641
#>
#> Overall maximum convergence ratio: 0.0864
#>
#>
#> Total of 2300 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.564 0.026 -0.025 -0.046 -0.017 -0.193 -0.003 -0.120
#> 0.124 0.076 -0.008 0.001 0.001 -0.018 0.001 0.008
#> -0.716 -0.600 0.002 0.000 0.001 0.008 0.000 0.003
#> -0.228 0.010 -0.030 0.071 0.000 0.001 0.001 0.009
#> -0.357 0.040 0.191 -0.028 0.004 0.006 0.000 0.002
#> -0.703 -0.180 0.486 0.006 0.255 0.134 0.000 0.016
#> -0.219 0.237 -0.192 0.208 0.025 0.000 0.000 0.000
#> -0.542 0.096 0.243 0.117 0.082 0.152 -0.001 0.086
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 86.121 65.575 2690.897 86.390 177.960 117.736 1756.606 120.565
#> 65.788 79.563 2398.852 70.565 105.947 89.670 1270.255 82.864
#> 978.516 881.305 34358.295 997.380 1843.283 1250.602 20344.725 1306.369
#> 39.944 30.772 1237.573 66.723 83.872 54.549 728.972 53.266
#> 89.858 49.873 2466.488 95.118 690.995 108.987 1841.731 133.796
#> 61.032 45.211 1772.873 61.335 110.885 104.119 1199.138 85.692
#> 864.231 642.825 27932.140 798.635 1808.844 1133.433 22934.434 1219.038
#> 59.564 40.834 1782.868 56.914 131.262 81.452 1228.605 106.884
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 132.987 114.298 4423.940 135.375 241.936 177.087 2738.889 183.204
#> 0.856 133.981 4255.730 119.367 172.188 153.823 2320.285 151.026
#> 0.959 0.919 160135.472 4494.943 7476.879 5639.393 93119.970 5999.723
#> 0.862 0.758 0.825 185.299 253.092 181.417 2655.247 183.303
#> 0.562 0.398 0.500 0.498 1395.915 295.294 5128.668 357.919
#> 0.922 0.798 0.846 0.800 0.474 277.590 3519.510 244.130
#> 0.927 0.782 0.908 0.761 0.536 0.825 65629.461 3796.850
#> 0.929 0.763 0.877 0.787 0.560 0.857 0.867 292.395
However, when adding the control variables, the significant effect of the k-index disappears (b=-0.017; se=0.065) Thus, when taking into account not only structural network effects but also other covariates, the effect of homophily in k-index does not hold. Therefore, the hypothesis on similarity with regard to the k-index cannot be supported. Scientists at this department do not seem to consider the k-index of their possible co-authors. This also applies to ethnicity (b=0.277; se=0.366), age (b=0.007; se=0.019), and gender (b=0.000; se=0.266), as these effects are all insignificant. Interestingly, there is a significant effect of having Twitter or not (b=0.929; se=0.294). Sociologists at this department have a preference to work together with someone who is the similar in terms of (not) having Twitter.
myeffs2a <- getEffects(mydata)
myeffs2a <- includeEffects(myeffs2a, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffs2a <- includeEffects(myeffs2a, transTriads)
myeffs2a <- includeEffects(myeffs2a, altX, interaction1 = "ki")
(ans2a <- siena07(myAlgorithm, data = mydata, effects = myeffs2a, prevAns = ans1a))
#Save the last model since it has the lowest maximum convergence ratio.
save(ans2a, file="/Users/anuschka/Documents/labjournal/results/soc_model_cov2a")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.5895 ( 0.4282 )
#> 0.2 Rate parameter period 2 2.5321 ( 0.7308 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.6744 ( 0.3610 ) -0.0425
#> 2. eval transitive triads 0.6153 ( 0.2548 ) -0.0358
#> 3. eval degree act+pop 0.0877 ( 0.0400 ) -0.0341
#> 4. eval ki alter 0.0045 ( 0.0964 ) -0.0314
#>
#> Overall maximum convergence ratio: 0.0803
#>
#>
#> Total of 2030 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.130 0.021 -0.012 -0.007
#> 0.232 0.065 -0.006 -0.001
#> -0.865 -0.577 0.002 0.001
#> -0.211 -0.035 0.303 0.009
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 76.097 55.440 2354.715 -83.410
#> 56.493 69.397 2043.598 -80.169
#> 847.742 742.393 29737.993 -1096.967
#> -42.126 -41.760 -1659.366 257.927
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 104.424 88.758 3420.276 -126.052
#> 0.837 107.579 3277.367 -125.420
#> 0.953 0.900 123272.471 -4552.025
#> -0.540 -0.530 -0.568 521.361
In the model above where the effect of the k-index of the alter is included, this effect turns out to be insignificant (b=0.005; se=0.096). The k-index of the alter (regardless of one’s own k-index) is thus not regarded when looking to co-publishing with others of the department.
myeffs2 <- getEffects(mydata)
myeffs2 <- includeEffects(myeffs2, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffs2 <- includeEffects(myeffs2, transTriads)
myeffs2 <- includeEffects(myeffs2, altX, interaction1 = "ki")
myeffs2 <- includeEffects(myeffs2, sameX, interaction1 = "dutch")
myeffs2 <- includeEffects(myeffs2, absDiffX, interaction1 = "pub_first")
myeffs2 <- includeEffects(myeffs2, sameX, interaction1 = "twitter_dum")
myeffs2 <- includeEffects(myeffs2, sameX, interaction1 = "gender")
(ans2 <- siena07(myAlgorithm, data = mydata, effects = myeffs2, prevAns = ans1))
#Save the last model since it has the lowest maximum convergence ratio.
save(ans2, file="/Users/anuschka/Documents/labjournal/results/soc_model_cov2")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.6040 ( 0.4419 )
#> 0.2 Rate parameter period 2 2.6400 ( 0.7610 )
#>
#> Other parameters:
#> 1. eval degree (density) -3.7091 ( 0.6989 ) 0.0239
#> 2. eval transitive triads 0.6456 ( 0.2674 ) 0.0123
#> 3. eval degree act+pop 0.1003 ( 0.0452 ) 0.0101
#> 4. eval same gender -0.0057 ( 0.2741 ) 0.0469
#> 5. eval ki alter 0.0172 ( 0.1001 ) 0.0343
#> 6. eval same dutch 0.2850 ( 0.3621 ) 0.0006
#> 7. eval pub_first abs. difference 0.0075 ( 0.0205 ) 0.0200
#> 8. eval same twitter_dum 0.9433 ( 0.3010 ) 0.0441
#>
#> Overall maximum convergence ratio: 0.1236
#>
#>
#> Total of 2622 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.488 0.030 -0.022 -0.040 -0.013 -0.176 -0.004 -0.095
#> 0.159 0.072 -0.007 -0.001 -0.001 -0.019 0.001 0.001
#> -0.693 -0.575 0.002 -0.001 0.001 0.007 0.000 0.003
#> -0.209 -0.011 -0.061 0.075 -0.002 0.002 0.001 -0.003
#> -0.180 -0.042 0.271 -0.085 0.010 0.010 0.000 -0.002
#> -0.695 -0.191 0.435 0.023 0.276 0.131 0.001 0.009
#> -0.314 0.192 -0.147 0.231 0.024 0.129 0.000 0.000
#> -0.450 0.013 0.186 -0.038 -0.078 0.084 -0.059 0.091
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 82.075 61.224 2523.112 85.129 -85.839 113.217 1694.258 113.753
#> 62.790 76.407 2272.422 68.269 -81.233 87.192 1249.051 79.720
#> 938.990 837.980 32686.004 984.881 -1098.824 1220.864 19811.310 1239.628
#> 40.906 31.730 1262.912 68.962 -37.131 55.741 770.196 56.925
#> -46.691 -47.280 -1740.995 -44.419 255.857 -75.357 -1014.215 -53.914
#> 59.899 44.221 1738.572 62.136 -75.378 103.039 1180.077 82.979
#> 839.243 601.209 26374.154 785.189 -901.311 1097.164 22327.860 1170.331
#> 57.699 40.069 1705.901 59.504 -52.496 79.491 1197.874 102.558
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 133.537 114.617 4417.847 144.007 -149.870 177.671 2804.000 182.542
#> 0.847 137.142 4318.841 126.040 -151.349 153.756 2378.712 148.630
#> 0.955 0.921 160420.643 4763.993 -5376.264 5673.897 94216.400 5869.856
#> 0.871 0.753 0.832 204.504 -154.816 191.741 2882.556 195.811
#> -0.560 -0.558 -0.580 -0.468 535.636 -224.554 -3229.068 -191.531
#> 0.923 0.789 0.851 0.805 -0.583 277.209 3626.967 243.660
#> 0.930 0.779 0.902 0.773 -0.535 0.835 68008.330 3830.213
#> 0.924 0.743 0.857 0.801 -0.484 0.856 0.859 292.120
When including the control variables, the alter effect of the k-index remains insignificant (b=0.017; se=0.100). This rejects the hypothesis on co-publication with scientists with a lower or higher k-index, as for the scientists at the Sociology department, the k-index of their co-authors does not matter. Furthermore, the effect of gender (b=-0.006; se=0.274), age (b=0.008; se=0.021), and ethnicity (b=0.285; se=0.362) are again not significant. The effect of similarity in having a Twitter account (b=0.943; se=0.301) is significant. As concluded before, scientists at this department do prefer to co-publish with other scientists that are similar in terms of their Twitter account.
rm(list=ls())
#dependent
dnet <- sienaDependent(dnet_array)
### Step 1: define data
#gender
gender <- as.numeric(datadef_df$gender=="female")
gender <- coCovar(gender)
#Kardashian Index
ki <- as.numeric(datadef_df$ki)
ki <- coCovar(ki)
#Ethnicity
dutch <- as.numeric(datadef_df$dutch)
dutch <- coCovar(dutch)
#Twitter dummy as control variable
twitter_dum <- (datadef_df$twitter_dum)
twitter_dum <- coCovar(twitter_dum)
# #year first pub
# data_staff_cit %>% group_by(gs_id) %>%
# mutate(pub_first = min(year)) %>%
# select(c("gs_id", "pub_first")) %>%
# distinct(gs_id, pub_first, .keep_all = TRUE) -> firstpub_df1
#
# datadef_df <- left_join(datadef_df, firstpub_df1)
#
# #if no publication yet, set pub_first op 2023
# datadef_df %>% mutate(pub_first = replace_na(pub_first, 2023)) -> datadef_df
pub_first <- coCovar(datadef_df$pub_first)
mydata <- sienaDataCreate(dnet, gender, ki, dutch, pub_first, twitter_dum)
### Step 2: create effects structure
myeff <- getEffects(mydata)
effectsDocumentation(myeff)
### Step 3: get initial description
print01Report(mydata, modelname = "/Users/anuschka/Documents/labjournal/results/data_report")
For Data Science, the Jaccard Index of the first comparison of waves is 0.304. For the second wave change, the index is 0.286. These are lower numbers than at the Sociology department, but they are still high enough to estimate correctly (Ripley et al. 2022)
### Step4: specify model
myeff <- includeEffects(myeff, degPlus)
myeff <- includeEffects(myeff, transTriads)
### Step5 estimate
myAlgorithm <- sienaAlgorithmCreate(projname = "data_report")
(ans <- siena07(myAlgorithm, data = mydata, effects = myeff))
# (the outer parentheses lead to printing the obtained result on the screen) if necessary, estimate
# further
#(ans <- siena07(myAlgorithm, data = mydata, effects = myeff, prevAns = ans))
save(ans, file="/Users/anuschka/Documents/labjournal/results/data_model_struc")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.5820 ( 0.4415 )
#> 0.2 Rate parameter period 2 2.9902 ( 0.6460 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.3586 ( 0.2841 ) 0.0107
#> 2. eval transitive triads 1.2539 ( 0.2120 ) -0.0054
#> 3. eval degree act+pop 0.0339 ( 0.0307 ) 0.0061
#>
#> Overall maximum convergence ratio: 0.0268
#>
#>
#> Total of 1920 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.081 0.004 -0.007
#> 0.066 0.045 -0.003
#> -0.854 -0.411 0.001
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 81.660 49.627 2350.436
#> 68.076 132.684 2990.848
#> 895.133 908.951 34158.543
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 131.208 189.773 5249.323
#> 0.738 504.437 10424.327
#> 0.896 0.908 261456.111
In the first model for Data Science with structural network effects, similar effects are visible as at the Sociology department. There again is a negative density effect (b=-2.359; se=0.284), albeit less strong. The transitivity effect (b=1.254; se=0.212) is larger for this network than for the Sociology department. Data scientists prefer to co-publish with co-authors of their co-authors more than Sociologists, which is in line with the transitivity observed from the descriptive statistics. The effect of activity and popularity (b=0.034; se=0.031) is insignificant: Apparently Data scientists do not have a preference to co-publish with scientists of their department who have already co-published many times.
myeffd1a <- getEffects(mydata)
myeffd1a <- includeEffects(myeffd1a, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffd1a <- includeEffects(myeffd1a, transTriads)
myeffd1a <- includeEffects(myeffd1a, absDiffX, interaction1 = "ki")
(ansd1a <- siena07(myAlgorithm, data = mydata, effects = myeffd1a, prevAns = ans))
#Save the last model since it has the lowest maximum convergence ratio.
save(ansd1a, file="/Users/anuschka/Documents/labjournal/results/data_model_cov1a")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.5857 ( 0.4509 )
#> 0.2 Rate parameter period 2 3.1264 ( 0.7009 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.2254 ( 0.2976 ) -0.0670
#> 2. eval transitive triads 1.2295 ( 0.2042 ) -0.1021
#> 3. eval degree act+pop 0.0278 ( 0.0320 ) -0.0983
#> 4. eval ki abs. difference -0.0957 ( 0.0865 ) 0.0139
#>
#> Overall maximum convergence ratio: 0.1133
#>
#>
#> Total of 2209 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.089 0.002 -0.008 -0.006
#> 0.027 0.042 -0.002 0.001
#> -0.847 -0.364 0.001 0.000
#> -0.233 0.030 0.058 0.007
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 81.829 46.706 2277.465 92.249
#> 70.602 134.006 2984.585 54.816
#> 877.102 845.448 31968.293 836.507
#> 30.884 -10.710 437.718 356.676
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 142.422 203.732 5589.594 97.584
#> 0.748 520.289 10723.104 63.818
#> 0.908 0.912 265800.848 2692.774
#> 0.285 0.098 0.182 821.068
In contrast to the Sociology department, the effect of difference in k-index is not significant (b=-0.096; se=0.087) at the Data Science department. From the above model without control variables it can already be concluded that Data Scientists do not compare their k-index with their possible co-authors and that they do not seem to attach value to this index.
myeffd1 <- getEffects(mydata)
myeffd1 <- includeEffects(myeffd1, degPlus) #some publish a lot, some not. (interpretation: talent/luck? )
myeffd1 <- includeEffects(myeffd1, transTriads)
myeffd1 <- includeEffects(myeffd1, absDiffX, interaction1 = "ki")
myeffd1 <- includeEffects(myeffd1, sameX, interaction1 = "dutch")
myeffd1 <- includeEffects(myeffd1, absDiffX, interaction1 = "pub_first")
myeffd1 <- includeEffects(myeffd1, sameX, interaction1 = "twitter_dum")
myeffd1 <- includeEffects(myeffd1, sameX, interaction1 = "gender")
(ansd1 <- siena07(myAlgorithm, data = mydata, effects = myeffd1, prevAns = ans))
#Save the last model since it has the lowest maximum convergence ratio.
save(ansd1, file="/Users/anuschka/Documents/labjournal/results/data_model_cov1")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.5988 ( 0.4227 )
#> 0.2 Rate parameter period 2 3.0587 ( 0.7459 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.3007 ( 0.3926 ) 0.0406
#> 2. eval transitive triads 1.2459 ( 0.2103 ) 0.0595
#> 3. eval degree act+pop 0.0287 ( 0.0332 ) 0.0443
#> 4. eval same gender -0.0625 ( 0.2210 ) 0.0056
#> 5. eval ki abs. difference -0.1176 ( 0.0928 ) -0.0090
#> 6. eval same dutch -0.0116 ( 0.2053 ) 0.0510
#> 7. eval pub_first abs. difference -0.0138 ( 0.0129 ) 0.0365
#> 8. eval same twitter_dum 0.4269 ( 0.2082 ) 0.0019
#>
#> Overall maximum convergence ratio: 0.1087
#>
#>
#> Total of 2598 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.154 0.014 -0.009 -0.025 -0.009 -0.025 -0.001 -0.030
#> 0.164 0.044 -0.003 -0.001 0.000 -0.001 0.000 0.000
#> -0.663 -0.480 0.001 0.000 0.000 0.000 0.000 0.000
#> -0.290 -0.032 -0.037 0.049 0.001 -0.003 0.000 0.000
#> -0.235 -0.003 0.152 0.038 0.009 -0.001 0.000 0.000
#> -0.309 -0.017 0.035 -0.058 -0.038 0.042 0.000 0.000
#> -0.285 -0.073 -0.052 0.032 -0.027 -0.015 0.000 0.000
#> -0.370 -0.002 0.028 0.001 -0.018 -0.001 0.050 0.043
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 82.684 45.123 2288.516 100.109 80.365 93.760 1642.218 98.551
#> 73.678 141.231 3220.486 93.439 54.573 83.780 1574.858 87.874
#> 891.897 869.796 32893.785 1086.705 720.172 994.546 18500.325 1072.581
#> 49.807 28.789 1410.045 105.543 35.287 61.121 992.038 57.508
#> 36.791 -8.727 526.555 40.190 314.567 50.425 663.205 49.869
#> 48.386 26.310 1332.954 63.166 41.805 98.623 967.956 53.192
#> 840.573 505.625 23694.907 993.753 808.147 955.130 28635.455 945.481
#> 52.857 27.281 1461.060 60.236 61.037 56.501 987.674 113.797
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 137.367 197.604 5345.680 170.119 106.110 152.824 2804.625 167.227
#> 0.738 522.475 10643.141 247.322 94.868 221.704 4141.972 239.771
#> 0.895 0.914 259450.222 6597.409 3233.144 5930.113 110880.375 6613.630
#> 0.824 0.614 0.735 310.625 102.568 203.797 3467.115 194.211
#> 0.334 0.153 0.234 0.215 733.585 116.462 1954.605 161.659
#> 0.823 0.612 0.734 0.729 0.271 251.262 3122.375 170.051
#> 0.843 0.638 0.767 0.693 0.254 0.694 80607.050 3252.736
#> 0.802 0.590 0.730 0.619 0.335 0.603 0.644 316.567
As expected, the k-index remains insignificant (b=-0.118; se=0.093) when adding the control variables to the model. Comparable to the Sociology department, the effects of age (b=-0.014; se=0.013), gender (b=-0.063; se=0.221) and ethnicity (b=-0.012; se=0.205) are not significant. Similarity in age, gender and ethnicity thus does not play a role in the selection of suitable co-authors. (Not) having Twitter is significant (b=0.427; se=0.208), thus the Data scientists do select scientists to co-publish with who have a similar Twitter status as themselves.
myeffd2a <- getEffects(mydata)
myeffd2a <- includeEffects(myeffd2a, degPlus)
myeffd2a <- includeEffects(myeffd2a, transTriads)
myeffd2a <- includeEffects(myeffd2a, altX, interaction1 = "ki")
(ansd2 <- siena07(myAlgorithm, data = mydata, effects = myeffd2a, prevAns = ansd1))
In the above model, it turns out that the effect of the k-index of the alter (regardless of the k-index of ego) is also not significant (b=-0.109; se=0.177), which is the same at both departments.
myeffd2 <- getEffects(mydata)
myeffd2 <- includeEffects(myeffd2, degPlus)
myeffd2 <- includeEffects(myeffd2, transTriads)
myeffd2 <- includeEffects(myeffd2, altX, interaction1 = "ki")
myeffd2 <- includeEffects(myeffd2, sameX, interaction1 = "dutch")
myeffd2 <- includeEffects(myeffd2, absDiffX, interaction1 = "pub_first")
myeffd2 <- includeEffects(myeffd2, sameX, interaction1 = "twitter_dum")
myeffd2 <- includeEffects(myeffd2, sameX, interaction1 = "gender")
(ansd2 <- siena07(myAlgorithm, data = mydata, effects = myeffd2, prevAns = ansd1))
#Save the last model since it has the lowest maximum convergence ratio.
save(ansd2, file="/Users/anuschka/Documents/labjournal/results/data_model_cov2")
#> Estimates, standard errors and convergence t-ratios
#>
#> Estimate Standard Convergence
#> Error t-ratio
#>
#> Rate parameters:
#> 0.1 Rate parameter period 1 1.6061 ( 0.4450 )
#> 0.2 Rate parameter period 2 3.0052 ( 0.6512 )
#>
#> Other parameters:
#> 1. eval degree (density) -2.4120 ( 0.3738 ) 0.0499
#> 2. eval transitive triads 1.2635 ( 0.2273 ) 0.0970
#> 3. eval degree act+pop 0.0292 ( 0.0347 ) 0.0940
#> 4. eval same gender -0.1252 ( 0.2229 ) 0.0314
#> 5. eval ki alter -0.1825 ( 0.1949 ) -0.0780
#> 6. eval same dutch -0.0259 ( 0.2130 ) 0.0720
#> 7. eval pub_first abs. difference -0.0155 ( 0.0122 ) 0.0397
#> 8. eval same twitter_dum 0.4449 ( 0.2155 ) 0.0413
#>
#> Overall maximum convergence ratio: 0.1464
#>
#>
#> Total of 2523 iteration steps.
#>
#> Covariance matrix of estimates (correlations below diagonal)
#>
#> 0.140 0.017 -0.009 -0.017 0.004 -0.025 -0.001 -0.031
#> 0.196 0.052 -0.004 -0.008 0.003 -0.002 0.000 0.001
#> -0.681 -0.478 0.001 0.000 0.000 0.000 0.000 0.000
#> -0.204 -0.152 -0.018 0.050 0.008 -0.001 0.000 -0.002
#> 0.055 0.062 0.047 0.180 0.038 0.000 0.000 0.000
#> -0.316 -0.048 0.011 -0.031 -0.009 0.045 0.000 0.000
#> -0.126 -0.052 -0.166 0.010 0.087 -0.011 0.000 0.000
#> -0.382 0.014 0.053 -0.039 0.002 0.004 -0.045 0.046
#>
#> Derivative matrix of expected statistics X by parameters:
#>
#> 82.867 54.734 2436.839 94.031 -66.536 95.261 1645.538 97.204
#> 73.246 143.629 3250.033 90.917 -73.476 85.324 1616.128 82.382
#> 916.788 991.093 35035.922 1060.651 -835.160 1045.718 19731.843 1064.627
#> 49.562 42.657 1549.834 107.048 -47.970 60.078 1044.855 52.364
#> -40.297 -43.931 -1419.631 -55.220 122.510 -45.919 -925.309 -43.095
#> 47.762 33.915 1386.495 57.501 -38.548 98.735 965.427 52.762
#> 871.115 698.277 27854.681 1036.796 -784.491 989.963 30210.400 1016.390
#> 54.759 39.268 1661.664 55.376 -37.027 58.040 1079.015 112.646
#>
#> Covariance matrix of X (correlations below diagonal):
#>
#> 137.927 204.379 5515.734 159.453 -139.704 154.245 2871.896 168.417
#> 0.737 557.835 11315.110 237.273 -248.185 228.249 4473.599 249.441
#> 0.896 0.914 274708.549 6248.888 -6227.934 6029.118 119622.675 6846.045
#> 0.775 0.573 0.680 307.153 -173.417 190.427 3470.226 162.017
#> -0.573 -0.506 -0.572 -0.477 431.195 -156.119 -3133.904 -142.681
#> 0.814 0.599 0.713 0.673 -0.466 260.602 3207.117 170.972
#> 0.848 0.657 0.792 0.687 -0.524 0.689 83110.098 3414.815
#> 0.805 0.593 0.734 0.519 -0.386 0.595 0.665 316.991
In the model with the control variables added, the effect of alter’s k-index remains insignificant (b=-0.183; se=0.195). The hypotheses on co-publication with scientists with a lower or higher k-index are thus also not confirmed for Data Scientists. Just like in the other models for Data Science, age (b=-0.016; se=0.012), gender (b=-0.125; se=0.223), and ethnicity (b=-0.026; se=0.213) are not of significant importance for selecting scientists to co-publish with. (Not) having a Twitter account is something that remains of importance (b=0.445; se=0.216): also in this model it shows that Data scientists prefer to co-publish with another scientists that is similar with regard to their Twitter profile.
It is possible that the k-index is not significant throughout most models, because scientists do not regard the ratio between Twitter activity and scientific publications, but especially care about Twitter activity. In Appendix A, the same models are run as above, but then without the k-index and with the number of Twitter followers included. For both Sociology and Data Science, the effects of the number of Twitter followers are insignificant. This applies to the effects of absolute difference as well as the alter’s number of followers. These effects are thus rather similar as the results of the k-index. One difference noted is that the first model for Sociology in Appendix A is insignificant, while the absolute difference between the k-index was significant for Sociology when control variables were not included. However, also the effect of the k-index became insignificant when including the control variables, and thus equal conclusions would be drawn from the models with the k-index and that with the number of twitter followers.
In Appendix B, the same models are run without the dummy for having a Twitter account. This again results in similar outcomes, meaning that the effect of having Twitter when included in the model does not take away the possible effect of the k-index. In these models without the Twitter dummy, the k-index remains insignificant. Again, the same conclusions would be drawn. Therefore, the results are rather robust.