Диссертация (1136614), страница 16
Текст из файла (страница 16)
This approximationof actual friendships by social interaction strengths allows us to track the effective networkevolution between students with much higher precision (see S1 Fig).The network of university students (seniors) on March 2016 is shown in Fig 2. Previousstudies on the Facebook (or its Russian analog VK) have focused on the (relatively static)friendship marking options that are provided by the sites [50–52]. This dataset not onlyallows us to quantify the extent of academic homophily among students but also to see itsdetailed evolution over time. In particular, we are able to clarify the mechanism behind theemergence of academic homophily from an initially homogeneous population across severalyears.We use two datasets of academic performance records measured as grade point averages(GPA), one with 655 students from the 5th to 11th grades (age from 11 to 18) of a Russian public high school in Moscow (for reasons of anonymity we do not state the name of the school),the other with 5,925 bachelor students of the Higher School of Economics in Moscow.
Highschool students receive their grades at the end of each trimester, their GPAs for the last 5 trimesters were available. Since the academic year of 2014/15 the Higher School of Economicsstarted to publish a public ranking of its students. It contains information about their GPAsfor the current semester along with the aggregated average GPA across the whole period oftheir studies. We collect the temporal GPA data into a vector GHS=UðtÞ that represents the GPAiPLOS ONE | https://doi.org/10.1371/journal.pone.0183473 August 30, 20174 / 16Formation of homophily in academic performanceFig 2.
Snapshot of the friendship network of university students. The network is reconstructed from students’ interactions on the social network siteVK, the Russian variant of Facebook. Nodes represent students, links exist if one student gave a “like” to another at least once in March 2016.
Colorrepresent the performance (GPA) of students across the whole period of studies. There is visible clustering of students with similar GPA.https://doi.org/10.1371/journal.pone.0183473.g002of student i at time t and corresponds to a student’s performance within the time period fromt − 1 to time t. t = 1 indicates the end of the first trimester/semester, t = T is the end of the lasttrimester/semester for a given group of students. HS indicates “high school”, U is “university”.For detailed information about time points corresponding to GPAs collection see S6 Fig.
Notethat grades are different for high school and university. For high school grades range from 2(worst) to 5 (best), for university from 4 (worst) to 10 (best). The average GPA of a student HS=Uacross the entire available time period we denote by G. For university students we follow 4icohorts that are labeled by X in the following way: GU;Xi , where X = 1, 2, 3, 4 stands for freshmen, sophomores, juniors, and seniors, respectively. The average GPA for high school studentsand the cohorts of university students are presented in S1 Table.PLOS ONE | https://doi.org/10.1371/journal.pone.0183473 August 30, 20175 / 16Formation of homophily in academic performanceTo generate a proxy for the temporal friendship interaction network between studentswe use the popular SNS VK, whose main component is a user-generated news feed.
This feedcontains all content that was generated (posted) by users and is generally visible to friendsonly. If users like the content that was posted by their friends they can indicate this by aninstant feedback called a “like”. “Likes” may mean different things to different people [53],however, “likes” can, in general, be seen as an indication of active friendship contacts betweenusers.VK provides an application programming interface (API) that allows to download information systematically in an open JSON format.
In particular, it is possible to download user profiles from particular educational institutions and within selected age ranges. For each user, it ispossible to obtain the list of their friends and the content that was published by them alongwith the VK identifiers of users that liked this content. Posting times are known with a timeresolution of one second.
“Likes” for specific content are almost always placed within 1-2 daysafter the content was posted. Using specially developed software the profiles of students of agiven institution were downloaded and automatically matched by their first and last nameswith the available data on students’ performance. 88% of all high school students and 95% ofuniversity students could be identified on VK (see S1 Text). The matching procedure was performed by authorized representatives of the high school and the Higher School of Economics,respectively. After the matching procedure, all names and VK identifiers were irrevocablydeleted. The “likes” of all users were collected with corresponding timestamps, those fromusers outside the educational institutions were removed.
“Likes” were then aggregated tointervals of 3 months periods. For each group of students, we obtain a N × N adjacency matrixA(t), where Aij(t) = 1 if student i places at least one “like” to student j from time t − 1 to time t.For detailed information about time periods corresponding to collected network data seeS6 Fig. The subsequent deletion of all information on individual “likes” and respective timestamps prevents the possibility of any de-anonymization. The resulting datasets were transferred to the Institute of Education, which made it available for research in fully anonymizedform.ResultsWe first demonstrate the existence of academic homophily and then try to understand its origin.
For all groups of students we find strong homophily. To make it comparable with otherhomophily studies such as in [14], we use a standard way of quantifying it by the conditionalprobability increase, IX that a student belongs to top Xth percentile of performers, given thathis/her friends also belong to the same percentile (see S1 Text). IX(t) = 0 means that grades andfriendship network are uncorrelated, IX(t) = 100% means that the probability to be in the topXth percentile is 2 times higher if the student’s friends are also in the top Xth percentile, compared to the situation when they are not.
IX(t) can not only be computed for friends (social distance 1) but also for friends of friends (social distance 2), and friends of friends of friends(social distance 3), etc. In Fig 3 we fix X to be the 50th (above average students) (a) and (b) and80th (excellent students) percentile (c) and (d), respectively. For social distances up to 2 weobserve significant homophily for all student groups at the last time point T = 6 for highU;4HSHSschool, and T = 14 for university. We find I50%ð6Þ ¼ 23%, I80%ð6Þ ¼ 57%, I50%ð14Þ ¼ 30%,U;4I80%ð14Þ ¼ 49%, p-value < 10−4. Significance was tested with a permutation test (10,000 permutations), see Methods.
Note that the corresponding values at the first time point are smaller,U;4U;4HSHSI50%ð1Þ ¼ 22%, I80%ð1Þ ¼ 34%, I50%ð1Þ ¼ 16%, I80%ð1Þ ¼ 28%.This result holds independent from the method used. Following an alternative approach forscalar variables we compute the assortativity coefficient r [54] and again find highly significantPLOS ONE | https://doi.org/10.1371/journal.pone.0183473 August 30, 20176 / 16Formation of homophily in academic performanceFig 3. Homophily of students with good (a) and (b), and excellent grades (c) and (d), as a function of socialdistance. Observed increase in probability IX that a student is in the top Xth percentile of students, given that their friendsare also in the top Xth percentile.
Results for the high school are shown in (a) and (c), for university in (b) and (d). Verticallines indicate 95% confidence intervals computed with the permutation test. The social distance of 1 means friends, thesocial distance of 2 means friends of friends and the social distance of 3 means friends of friends of friends.https://doi.org/10.1371/journal.pone.0183473.g003homophily at the last time point rHS(6) = 0.20 (p-value < 10−4) and rU,4(14) = 0.21 (p-value< 10−4).
At the first time point homophily is much smaller, rHS(1) = 0.12 and rU,4(1) = 0.12.In Fig 4 we show the time evolution of homophily over 1.5 years for high school students(a) and over 3.5 years for university students (b). We employ a transparent definition of aHomophily Index, H(see Methods). We see a clear increase of H from the first to the last trimester from about H = 0.20 to H = 0.41 for the high school (a) (circles), and from H = 0.24 toH = 0.40 for university (b) (crosses).
We next show in a series of three arguments that theincrease in homophily over time can not be explained by the socialization/adaptation mechanism, i.e. by the changes in GPAs over time.PLOS ONE | https://doi.org/10.1371/journal.pone.0183473 August 30, 20177 / 16Formation of homophily in academic performanceFig 4. Evolution of homophily (Homophily index) in friendship networks of high school (a) and universitystudents (b). Homophily increases with time by almost a factor of 2 (circles).
The significance of the observed effect ismeasured with a randomization test (triangles), where grades were reshuffled randomly between the nodes in thenetwork. It is amazing that when the GPAs of individual students are fixed to their temporal average (crosses), practicallythe same increase of homophily is observed, which signals the dominance of network restructuring. Results can beunderstood with a simple model (squares). Vertical bars are standard deviations.https://doi.org/10.1371/journal.pone.0183473.g004Ruling out socialization/adaptation1. The first argument why socialization/adaptation is not the relevant mechanism behindthe observed homophily increase, is due to the fact that academic performance is known tobe a relatively persistent feature of students.