3Summary (1136612), страница 3
Текст из файла (страница 3)
By one of ourestimates, the number of fake profiles could be reduced from 66% to 8% on the “School” sample ifusers who do not have friends from this school are excluded.Table 4. Percentages of identified VK users for different city sizes.FemalesMales68%83%86%93%Cities with less than 50 000 people87%90%Cities with 50 000 – 100 000 people81%85%Cities with 100 000 – 450 000 people81%88%Cities with 450 000 – 680 000 people83%86%Cities with more than 680 000 people79%84%Saint Petersburg83%87%Moscow78%73%Rural-type settlementUrban-type settlement11Structure of social ties in the digital spaceWe find that the structure of social ties on VK reproduces the structure of educationalorganizations, including the division of a school into grades, Q = 0.47, (Figure 1а) and buildings, Q= 0.35, (Figure 1б), and the division of a university into campuses, Q = 0.32, years of studies, Q =0.58, and educational programs, Q = 0.68 (Figure 2). In all cases, the observed values of Q arestatistically significant with p-values < 10-4 (permutation test).Figure 1.
Structure of school social ties. Nodes correspond to students and links tofriendships on VK. Different colors correspond to grades from 5th to 11th (a) and school buildings(б).Figure 2. Structure of university social ties. Different colors correspond to years of studies.Visible clusters inside each year correspond to different educational programs.12Figure 3. Structure of social ties between schools of Saint Petersburg. Different colorscorrespond to different administrative districts.We also study the structure of friendship ties on a city scale (Figure 3). We show that theprobability of a friendship tie between two schools decreases with geographical distance followingthe power law (Figure 4).Figure 4. The relationship between the probability of a friendship tie between schools andgeographical distance between them.13Differentiation of social ties in the digital spaceWe show that online social ties of students from an educational organization aredifferentiated by academic performance, namely that students with similar academic performanceare interacting more frequently online.
We also find that the level of this differentiation increaseswith time (Figure 5). We show that this increase cannot be explained by changes in academicperformance but rather is explained by rewiring of social ties. Less-similar students break social tieswith higher probability and more-similar students create new ties with higher probability. Theseresults are presented in [44].We also study the social ties of students on a city scale.
We show that the probability of afriendship tie on VK between students from different schools is higher for the schools with similaracademic performance. These results hold true regardless of the geographical distance betweenschools (Figure 6). Hence, schools are segregated in the digital space despite the absence ofgeographical segregation.Figure 5. Increase in correlation (Homophily Index) between GPA of students and theaverage GPA of their friends for a school (a) and a university (b).14Figure 6. Correlation between the average USE scores of schools and their closest neighborsin the digital (a) and physical (b) space.Differentiation of interests in the digital spaceWe find that students’ interests are correlated with their gender (for instance, boys preferpublic pages related to football and computer games), age (for instance, older students are interestedin graduation examinations) and also with their academic performance [2].
Low-performingstudents are subscribed to such pages as “Love Horoscope” and “Unorthodox Horoscope” whilehigh-performing students prefer such pages as “Interesting facts” and “The best poems of greatpoets” (Figure 7).We also show that online interests could explain as much as 25% of the variation inacademic performance of students (Figure 8). This is comparable to the percentage of variation thatcan be explained by the socioeconomic status of students. The gap in educational outcomes ofsubscribers to different groups (for instance “World Art and Culture” and “Love Horoscope”) couldbe equivalent to two years of formal schooling (Table 5).15Figure 7. Students’ interests map.Figure 8.
Pearson correlation coefficient between predicted and real PISA scores as afunction of the number of components used in the linear regression.16Table 5. Names of public pages that contribute most to the academic component of users’interests. Names are translated from Russian. Mean values of subscribers’ scores with standarderrors (in parentheses) are provided for each of three PISA subjects.MathematicsReadingScienceWAC (World Artsand Culture)538 (4.6)530 (4.5)532 (4.3)Science521 (4.2)502 (4.1)516 (3.8)Best poems of greatpoets509 (4.0)507 (4.0)508 (3.9)Science andTechnology507 (4.1)479 (4.3)504 (4.0)Five Best Movies505 (3.9)492 (3.9)503 (3.7)F*CK473 (3.3)449 (3.4)472 (3.2)Killing humor471 (5.1)447 (5.1)471 (4.7)Cool Gags467 (4.9)444 (5.1)465 (4.9)UnorthodoxHoroscope462 (5.1)450 (5.3)460 (5.0)Love Horoscope450 (5.3)442 (5.8)453 (5.2)Positive contributionNegative contributionThese results could be summarized as the following:— the method we propose allows to extract reliable information from VK and to combine itwith educational data; resulting data could be used to study social ties of students and their interests;—_the structure of online friendship reproduces the structure of educational organizations;the social proximity in the digital space is closely related to the geographical proximity; theprobability of a friendship tie between students from different schools declines with geographicaldistance following power law;— social ties of students are differentiated by academic performance in the digital space;students with similar performance create ties with higher probability and students with dissimilarperformance break ties with higher probability; the students from similar performing schools are17more often connected on a social networking site regardless the geographical distance betweenschools;— students’ interests are differentiated by academic performance in the digital space; thegap in educational outcomes of subscribers to different public pages could be equivalent to twoyears of formal schooling.18ConclusionEthical considerationsFor the purposes of this work, we use only publicly available information from the socialnetworking site.
The VK team confirmed to us that this data can be used for research purposes. Thematching of VK profiles with information about students was done automatically, after matchingthe data was anonymized and later used for analysis only in this anonymized form. The procedurewas approved by the IRB of Higher Schools of Economics.It is important to note that new sources of data not only opens up new opportunities forresearchers but also raise new ethical questions.
For instance, the notion of informed consentrequires special attention. By accepting terms of service users of social networking sites agree thatinformation about them could be accessed by third parties and used for a variety of unspecifiedpurposes. However, it is not clear if such consent could be considered as informed. Especiallyconsidering the fact that terms of services are rarely read and if read then users may still not fullyunderstand all the consequences of their consent.
For example, it was shown that digital tracesallows to effectively predict information that users did not disclose and may prefer not to be knownby others. Despite the appearance of the first ethical guidelines this field is still largely a grey zoneand requires additional attention from the research community.Scientific novelty and significance of the resultsWe have conducted the first large-scale study that combines detailed information about thebehavior of students on VK with educational data. We introduced methods that could increase thereliability of VK data and provided estimates for sample biases.
We introduced a novel approach tostudying the evolution of students’ social ties that did not require to conduct expensive longitudinalstudies. We showed how publicly available information from SNSs could be used to inferinformation about students’ interests and that this information could have a large predictive powerin respect to various students’ characteristics including their age, gender, and academicperformance. These results are important for further educational research because our methodscould be adopted by other researchers and have already been used in various works [46–48].For the first time, the structure of students’ friendship ties have been studied on a city scaleand the relationship between inter-school friendship and geographical distance has been revealed.We also studied the evolution of social ties of students within educational institutions and showedthat the differentiation of these ties by academic performance increased with time.
We explained themechanisms behind this phenomenon with a simple model. We showed that there was adifferentiation of students’ online interests by academic performance and, for the first time,19provided an estimate of the gap in educational outcomes between subscribers to various publicpages.We showed that social ties of students and their online interests had a large predictive powerin respect to academic performance. The variables that were constructed by us explained as muchvariation in educational outcomes (for individual students and for whole schools as well) as thesocioeconomic status measured by traditional indexes such as the index of economic social andcultural status (ESCS) used by PISA. This allows one to use constructed variables foroperationalization of social and cultural capital of students (at least for its digital dimension).Traditional indexes include such variables as parents’ level of education and number of books athome.
Such variables have a low resolution (e.g. parents’ level of education) or disputable facevalidity in the modern world (e.g. a number of books at home). Another advantage of our approachis that it shifts focus from family characteristics to characteristics of students themselves.It is important to note that our results do not give an answer to the question of whether theobserved differentiation leads to amplification or reproduction of inequality. We also do not studyany effects that families might have on the observed differentiation.