J. Grimmelmann (2010)
The next digital decade
(2008)
Federal search commission? Access, fairness and accountability in the law of searchCornell Law Review, 93
Ricardo Baeza-Yates, Felipe Saint-Jean, C. Castillo (2002)
Web Structure, Dynamics and Page Quality
(2011)
citation_publisher=Penguin Press, New York, NY; The filter bubble
(2004)
citation_publisher=The Beacon Press, Boston, MA; The new media monopoly
(2000)
Shaping the web: Why the politics of search engines matterThe Information Society, 16
Marcel Machill, Markus Beiler, Martin Zenker (2008)
Search-engine research: a European-American overview and systematization of an interdisciplinary and international research fieldMedia, Culture & Society, 30
R. Rogers (2013)
Digital methods
Micky Lee (2011)
Google ads and the blindspot debateMedia, Culture & Society, 33
(2002)
User behavior and the ?globalness? of Internet: From a Taiwan users' perspectiveJournal of Computer-Mediated Communication, 2
(2008)
citation_publisher=Springer, Berlin; Web search: Multidisciplinary Perspectives
E. Hargittai (2000)
Open portals or closed gates? Channeling Content on the World Wide WebPoetics, 27
B. Jansen, A. Spink, Sherry Koshman (2007)
Web searcher interaction with the Dogpile.com metasearch engineJ. Assoc. Inf. Sci. Technol., 58
(2011)
Bias in search results? Diagnosis and responseThe Indiana Journal of Law and Technology, 7
I. Lianos, E. Motchenkova (2013)
MARKET DOMINANCE AND SEARCH QUALITY IN THE SEARCH ENGINE MARKETJournal of Competition Law and Economics, 9
(2009)
citation_publisher=Studien Verlag, Vienna, Austria; Deep search: The politics of search beyond Google
(1964)
citation_publisher=McGraw-Hill, New York; Understanding media: The extensions of man
A Diaz (2008)
Web search: Multidisciplinary PerspectivesJournal of Computer‐Mediated Communication
S. Lawrence, C. Giles (1999)
Accessibility of information on the webNature, 400
B. Bagdikian (2004)
The new media monopolyBMJ Open
(2013)
citation_publisher=MIT Press, Cambridge, MA; Digital methods
(2007)
Equal representation by search engines? A comparison of websites across countries and domainsJournal of Computer-Mediated Communication, 12
Min Jiang (2012)
The business and politics of search engines: A comparative study of Baidu and Google’s search results of Internet events in ChinaNew Media & Society, 16
(2008)
Web search
F. Stalder, C. Mayer (2009)
Deep search: The politics of search beyond Google
(2008)
citation_publisher=Springer, Berlin, Germany; Web search
M McLuhan (1964)
Understanding media: The extensions of man
M. Hindman, Kostas Tsioutsiouliklis, Judy Johnson (2003)
\Googlearchy": How a Few Heavily-Linked Sites Dominate Politics on the Web
B. Edelman (2011)
Bias in search results? Diagnosis and responseMedia, Culture & Society, 7
(2006)
citation_publisher=Oxford University Press, Oxford, England; Who controls the Internet? Illusions of a borderless world
A. Halavais (2008)
Search engine society
J. Goldsmith, T. Wu (2006)
Who controls the Internet? Illusions of a borderless world
A. Díaz (2008)
Through the Google Goggles: Sociopolitical Bias in Search Engine Design, 14
A. Mowshowitz, Akira Kawaguchi (2002)
Assessing bias in search enginesInf. Process. Manag., 38
(2008)
citation_publisher=Polity Press, Cambridge, England; Search engine society
E. Pariser (2011)
The filter bubble
(2010)
citation_publisher=TechFreedom, Washington, DC; The next digital decade
S. Brin, Lawrence Page (1998)
The Anatomy of a Large-Scale Hypertextual Web Search EngineComput. Networks, 30
Frank Pasquale, O. Bracha (2007)
Federal Search Commission? Access, Fairness and Accountability in the Law of SearchUniversity of Texas School of Law
O. Bracha, F Pasquale (2008)
Federal search commission? Access, fairness and accountability in the law of searchYale Journal of Law and Technology, 93
L. Vaughan, Yanjun Zhang (2007)
Equal Representation by Search Engines? A Comparison of Websites across Countries and DomainsJ. Comput. Mediat. Commun., 12
M. Egan, A. MacLean, H. Sweeting, K. Hunt (2012)
Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic reviewInformation Processing & Management, 12
H. Liao (2013)
How does localization influence online visibility of user-generated encyclopedias?: a study on Chinese-language search engine result pages (SERPs)Proceedings of the 9th International Symposium on Open Collaboration
M. Feuz, Matthew Fuller, F. Stalder (2011)
Personal Web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalisationFirst Monday, 16
(2012)
Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic reviewBMJ Open, 12
(2006)
Search engine bias and the demise of search engine utopianismYale Journal of Law and Technology, 8
Abstract Do search engines drive Web traffic to well-established sites leading to a high degree of search results concentration? Do search engines favor their own content while demoting others? How parochial or cosmopolitan are search engines in directing traffic to sites beyond users' national borders? This study explores these issues by empirically comparing search results of Baidu, Google, and Jike from mainland China obtained in August 2011 and August 2012. It finds that search engines in China, particularly Baidu, tend to drive traffic to well-established sites. Baidu's results also raise serious doubts over its impartiality. Rather than making users' search experiences more cosmopolitan, tuned to the larger world around them, search engines rarely direct Chinese users to content beyond national borders. Over the past 15 years, search engines have become a major interface between users and the Web. Google alone processes over 3.5 billion queries a day and 1.2 trillion searches a year (Internet Live Stats, 2014). As gateways to online information and knowledge, search engines play a critical role in influencing user attention, directing web traffic, and arbitrating advertising dollars. To exist, Introna and Nissenbaum (2000) remarked, “is to be indexed by a search engine” (p.171). Search's impact on identity and visibility of persons, organizations, and even nations has attracted a lot of public debate (Halavais, 2008; Pariser, 2011). Search companies face serious challenges of quality, competition, fairness, and openness as well as larger questions of search's impact on economy, politics, and culture (Halavais, 2008; Pariser, 2011; Rogers, 2013). Despite search engines' complex and proprietary nature, research in such varied fields as computer science, communication, and law has probed search both theoretically and empirically (e.g. Spink & Zimmer, 2008). In particular, interest in search concentration, the tendency of search engines to drive Web traffic to established sites (Cho, Roy, & Adams, 2005; Hindman, Tsioutsiouliklis, & Johnson, 2003; Liao, 2013), and search bias, the propensity of search engines to favor their own content over competitors' (Edelman, 2011; Goldman, 2006; Grimmelmann, 2010), has grown. But research rarely delves into cases beyond Western countries, leaving important search markets like China underinvestigated. In a country of 450 million searchers and a market size of $1.5 billion (iResearch, 2013), little independent research on search exists. Further, few efforts have been made to examine search parochialism, or the extent to which search engines direct users to sites within national borders rather than outside them, potentially restricting the diffusion of news, information, knowledge and the very idea of a globally connected Internet, as captured in the first e-mail from China: “Across the Great Wall, we can reach every corner in the world.” How do search concentration, bias, and parochialism impact Chinese users' experiences? This paper explores this question by comparing search results from Baidu, Google, and Jike (sponsored by the Chinese government) collected from mainland China in 2011 and 2012. In what follows, I briefly summarize previous research on search concentration, bias, and parochialism and discuss the impact of search personalization and China's search market. Search concentration The tendency of search engines to route Web traffic to a handful of established sites has remained a scholarly concern since the late 1990s. Introna and Nissenbaum (2000) argued search could, by design or by accident, systematically give prominence to some sites at the expense of others, resulting in the consolidation of power in a few dominant individuals and institutions. While the concentration of search results is intricately linked to concentration of market power, this paper assesses “search concentration” in terms of search results rather than market share. Search concentration is usually attributed to the popularity metric most search engines adopt. Such a metric assigns more weight to well-known sites loaded with inbound links and ranks them more prominently than the lesser-known ones (Brin & Page, 1998). It privileges popularity over quality whereas quality is seen as the intrinsic importance of a webpage as opposed to popularity (Cho et al., 2005). When search engines repeatedly return existing popular pages at the top of results, these pages could get even more popular via user clicks, leading to the “rich-get-richer” phenomenon (Cho et al., 2005). Over time, this practice tends to favor those with financial clout whose dollars can translate into popular content and influence what gets found (Introna & Nissenbaum, 2000). Studying Chilean websites, Baeza-Yates, Saint-Jean, and Castillo (2002) demonstrated that Google PageRank assigns significantly lower rankings to new pages compared to older ones and performs poorly in identifying new, high-quality pages. Cho and Roy (2004) found it takes 66 times longer for a quality new page to become popular in Google PageRank than by pure “random surfing.” In the realm of politics, Hindman et al. (2003) argued that rather than democratizing the dissemination of political information, search engines like Google perpetuate a “winner take all” logic: The most linked political websites received the bulk of online traffic whereas the majority of websites including quality sites were pushed to obscurity, a phenomenon they dub “Googlearchy.” Interviews with senior engineers at major search providers also show that search development is driven overwhelmingly by market and technological considerations, while fairness and representativeness, determinants of quality media content in traditional journalism, are not primary concerns (Van Couvering, 2009). To overcome search concentration, scholars have proposed technical solutions such as altering ranking algorithms to identify quality pages earlier (Cho et al., 2005) and adopting controlled, randomized rank promotions to give unexplored quality pages a chance (Pandey, Roy, Olston, Cho, & Chakrabarti, 2005), as well as policy mandates such as reviewing search operations for irregularity by government agents (Pasquale, 2008) and building publicly funded search engines to minimize undue influence of the market (Hargittai, 2000). The debate over search concentration, however, subsided since Google started to offer personalized search in April 2005 and fully implemented it in December 2009 (Feuz, Fuller & Stalder, 2011). The impact of personalization on search concentration is complex as users coproduce results (Rogers, 2013). While some (e.g., Goldman, 2006) argue the problems of search concentration and bias will be moot once results are tailored for users based on their profile, contexts, and search histories, there is no clear evidence that personalization actually alleviates search concentration in individualized results, not to mention the pernicious privacy issues that have accompanied search giants' ever expansive operations. Instead, search concentration may have been exacerbated by the exponential growth of the Web and the emergence of “Internet hyper giants” such as Google, Microsoft, and Facebook (Labovitz, 2010). The move from “one-size-fits-all” ranking algorithms to personalization only partially addresses the underrepresentation of poorly linked, quality webpages, while search engines' ability to find, catalog, and rank billions of webpages lags further behind the expansion of the Web. In 1999, the percentage of the Web indexed by major search engines was below 16% (Lawrence & Giles, 1999). In August 2013, Google indexed 60 trillion unique URLs (Goodwin, 2013). Google Chairman Schmidt estimated that 5 exabytes (roughly 5000,000 trillion bytes) of information was created every 2 days in 2010 (Finley, 2011), implying the content indexed by Google may only constitute a tiny fraction. Moreover, consolidation of firms has led to the rise of “Internet hyper giants.” In 2009, 60% of all Internet content came from, or terminated within, just 100–150 companies (Labovitz, 2010). If search engines do little to curb the lopsided power wielded by hyper giants in leveraging their technical and economic resources to reinforce their dominance and minimize competition, search concentration gets worse, not better. Search bias Research on search bias has produced conflicting definitions and empirical evidence. Introna and Nissenbaum (2000) frame search bias as search practices to direct users to popular sites away from obscure ones, similar to the definition of “search concentration” above. Mowshowitz and Kawaguchi (2002) define bias as lack of “representativeness” of retrieved content. Cho and colleagues (2005) conceptualize bias as underrepresentation of new webpages of good quality by popularity-based search algorithms. More broadly, Van Couvering (2009) reframes search bias as an outcome of economic conflict inherent in search development where the less powerful are outweighed by the more resourceful. Grimmelmann (2010) also broadly defines search bias as the distortion of the information landscape by search engines. I define search bias specifically as search engine practices that favor their own content at the expense of competitive services. This conceptualization is close to Grimmelmann's definition of “self interest” (2010) where search engines engage in noncompetitive practices. “Favor” here denotes intention though it is hard to prove given search engines' proprietary nature. Algorithmic secrecy prevents spammers from gaming search engines but also deters accurate assessment of bias. The difficulty of proving search bias, however, should not deter investigations of abuse. By comparing search results across search engines, over time and across searches, Edelman (2011) argues Google presented prejudicially its own views (e.g., its view on network neutrality), favored placement of its own services (e.g., Google Product Search) and disfavored rankings of its rivals' sites. Baidu faces similar charges in China. It is the target of China's first antitrust case, brought by its competitor Hudong Baike (a Wikipedia-like site), which sued Baidu for U.S.$124 million for demoting the site's ranking in favor of Baidu's own encyclopedia service (Jiang, 2014). Chinese state TV station CCTV (2011) also produced evidence that Baidu has been mixing paid results with organic ones and bullying websites to pay for top rankings. So far, empirical research on search bias has approached the issue by comparing search results across search engines (e.g., detect targeted removal of a site favorable among users), over time (e.g., document sudden demotion/removal of sites), and across searches (e.g., search engines' placement of their own services in “undeservedly prominent locations”). Inference is grounded in market competition as search engines have incentive to promote self-interest (e.g., Edelman, 2011). These approaches are not without limitations. A site may be demoted justifiably for not being useful. Some may also argue Google's promotion of Google Maps against rival MapQuest is warranted because the former is superior. Despite exceptions, a comparative approach can help produce potential evidence of search bias when other inferences are not readily available. Based on these rationales, Edelman (2011) conducted a series of studies pointing to Google's biases. Wright (November, 2011) on the other hand argues that Google's practice is not inherently harmful and found that own-content bias (favored inclusion and ranking of one's own content) in Microsoft's Bing was more salient than in Google. Utilizing a similar approach to compare Google and Baidu's search results from China, Jiang (2014) finds that Baidu rarely links to its competitors Hudong Baike or Chinese Wikipedia, whereas their presence in Google's results is much more prominent, raising search bias concerns. Liao (2013) finds that Baidu Baike is given the most visibility across all types of search queries in Baidu's results. The impact of search personalization on search bias is hard to predict. However, frequent exposure of a user to a search engine's own services could reinforce search giants' predominance. Most problematically, users have no way of detecting or evaluating personalization given how willfully opaque and highly dynamic search engines are (Feuz et al., 2011). Google alone served 25% of North America's Internet traffic in 2012 (McMillan, 2013). Baidu stays as China's No. 1 (Alexa, 2013). Intentional power abuses by search giants need and should be exposed. Search parochialism Search parochialism is defined here as the tendency of search engines to direct users to sites within their national borders rather than outside them. As supercharged metamedia of our time, search engines resemble the decentralized, borderless, and democratizing dreams the public has ascribed to such intermediaries. They are often mentioned in the same breath with the Internet as part of a “global village” (McLuhan, 1964) built on information systems radiating throughout the world. Search engines are popularly seen as inherently international, unimpeded by national borders, removed from state jurisdiction, and propelled by a universal desire to discover. This paper argues that while search's global expansion has been facilitated by a worldwide communication infrastructure that is increasingly instantaneous and mobile, search continues to be more parochial than cosmopolitan. In other words, user behavior, search firms, search technologies, and search results continue to be defined and confined by national borders despite search engines' potential to crawl, index, and rank Web content globally. First, little empirical research exists on users' internationally oriented search behaviors. Prior studies of Taiwanese (Liu, Day, Sun, & Wang, 2002) and Iranian (Rogers, 2013) use of the Web reveal that despite global yearnings, people's use of the Web is more “local” or “national” rather than “global.” Search engines, perhaps designed to fulfill user needs, desires, and sense of belonging grounded in the “local,” continue to deliver search results that are parochial. Second, while the operation of search giants like Google seems global (at least on the surface), language, culture, economy, and politics may have shaped users' interests and behaviors in more parochial directions. Google, for example, has offices in more than 60 countries and uses more than 130 languages. However, by acting largely in accordance with the local legal boundaries through content regulations (Goldsmith & Wu, 2006; Machill, Beiler, & Zenker, 2008), search has become increasingly reterritorialized. To operate in China, for instance, Google complied with Chinese censorship rules between 2006 and 2010. Research also finds that Google and Yahoo's search results differ across Chinese language regions—mainland China, Singapore, Taiwan, and Hong Kong, where the same query in Taiwan and Hong Kong is much more likely to yield hyperlinks to U.S.-based Chinese content than in mainland China or Singapore (Liao, 2013). Besides China's Baidu, Russia's Yandex, and South Korea's Naver that dominate their home countries, government-funded Chinese search engine Jike openly swears allegiance to the state (Jiang, 2012). Third, search technologies have increasingly anchored users in the local. Through geo-location, search engines automatically detect users' physical locale and connect users to local information and ads. Goldsmith and Wu (2006) pointed out that such auto-tracing stems not from government mandate but demands by Internet users to link their experiences to geography. As with other services, search becomes less of an experience of “displacement” (detached from locality), but one of “re-placement” (re-embedded into locality) (Rogers, 2013), more “embodied” (anchored in one's physical environment) than “disembodied” (Stalder & Mayer, 2009). In this sense, geo-location can anchor users away from challenging sites or viewpoints, isolating users instead in their own information “filter bubble” (Pariser, 2011). Yet, the impact of “search parochialism” has not received sufficient scholarly scrutiny or public attention. Search personalization's impact on search parochialism can be profound. Search engines tailor results to users by compiling user profiles in three dimensions: the knowledge person (e.g., what a person is interested in, based on searches and clicks); the social person (whom a person is connected to, via e-mail, social networks, etc.); and the embodied person (where the user is located physically) (Stalder, & Mayer, 2009). Personalized search, Feuz and colleagues (2011) argue, is increasingly an affair of “augmented reality” where the machine “interprets the user's individual relationship to reality and then selects what's good for each.” The tendency to anchor searchers to locality, compounded by search engines' unequal indexing and coverage of the Web (Vaughan & Zhang, 2007), can lead to “search parochialism” where individuals are increasingly “encouraged” to be only occupied by local affairs, information, knowledge, and events without venturing outside their local or national borders. It overturns the borderless, globalizing visions we have popularly attached to search engines and the Internet more broadly. Its implications may be particularly problematic for the diffusion of news, information, knowledge, and ideas. Chinese search market Search engines are also an indispensible part of Chinese netizens' digital lives. Among the 564 million Chinese Internet users, 80%, or 450 million, reported using search engines, making search the second most popular online activity among Chinese netizens (CNNIC 2013). In the second quarter of 2013, China's search engine revenues were at $1.5 billion (iResearch, 2013), growing to be one of the most attractive search markets in the world. China's search market experienced considerable fluctuations in the past few years. The most notable is Google's dramatic departure from mainland China. After Google entered China in 2006, Baidu and Google were the dominant players, capturing 60 and 35% of the market, respectively, at the height of Google's presence in China (Jiang, 2012). On January 12, 2010, Google announced it would stop censoring its search results in China, citing cyber attacks and security breaches, and subsequently redirected its mainland China search traffic to its Hong Kong site. Its market share dwindled to 2.9% by August 2013 (CNZZ, 2013). The Baidu–Google duopoly has been replaced by Baidu's dominance (63%) and fierce competition from homespun search firms like 360 Search (18%), Sogou (10%), and Soso (3.6%) (CNZZ, 2013). Between 2010 and 2013, Chinese search market was noted for an intriguing three-way interaction between Baidu, Google, and Jike representing domestic, foreign, and state players (Jiang, 2012). Jike, a search engine sponsored by Party press People's Daily, was unveiled in 2011 in Beijing. It was led by Deng Yaping, former Chinese table tennis world champion turned party press chief, who openly vowed Jike would fulfill its “national duties” to advance Party ideology (Jiang, 2012). By 2014, due to financial woes, Jike merged with another state-backed search engine, Panguso, hoping to expand the Party's digital propaganda edifice. Research on Chinese search engines remains woefully inadequate despite the enormous number of users and market size. Of the existing research in this area, Western scholars tend to focus on censorship and policy, while Chinese researchers are more likely to emphasize business strategy, technology, and user behavior (Jiang, 2014). Few studies have empirically examined actual search results or their implications beyond censorship. Recently, Jiang (2014) started to probe patterns of censorship, search results overlap, ranking, and bias of search engines in China by comparing Baidu and Google's search results based on Chinese Internet events. She suggests search engines can be architecturally altered to serve political regimes, arbitrary in rendering social realities, and biased toward self-interest. Liao (2013) compares search results from Baidu, Google, and Yahoo using 3000 search queries in four Chinese-speaking regions (mainland China, Singapore, Hong Kong, and Taiwan), finding a strong “network gatekeeping” effect whereby search engines directed information flows based on geo-linguistic and cultural-political factors. To address such gaps in Chinese search engines research, this study poses the following questions: To mainland Chinese search engine users, how do search results differ between Baidu, Google (Chinese version), and Jike in terms of search concentration, bias, and parochialism? How do they change over time? Methods Search results comparison is typical in information retrieval studies. It evaluates search quality and infers search engine properties by querying search engine(s) with keywords to detect unique results patterns (e.g., overlap and ranking). Focusing on search concentration, bias, and parochialism, this study explores aspects of “source distance” (Rogers, 2013), that is, patterns of privilege conferred on top ranked results by search engines. In the following, I detail the operationalization of core concepts, data collection (query sample, data archiving), and analysis. Most users rarely go beyond the first results page with a default of 10 results (Jansen, Spink, & Koshman, 2007), as was the case with Baidu, Google, and Jike at the time of data collection. Hence, this study collects only the first 10 results for a query (textual only, excluding image, video results). First, search concentration is defined as the percentage of search results concentrated in a few websites that supply the largest numbers of search results returned for queries made to a search engine. I ranked the top 10 such websites after collating frequency counts, for instance, of how many results for 20 queries in Google are from Wikipedia, Baidu, etc. Second, while recognizing search bias may encompass biased presentation of facts and opinions on a wide range of topics from politics to ads, the study defines search bias narrowly as own-content bias (favored inclusion of search engine's own content, operationalized as the percentage of returned search results from a search firm's own web content) and other-content bias (exclusion of rivals' content, operationalized as the percentage of search results returned by a search engine from its rivals' web content). Third, search parochialism is operationalized as the percentage of search results retrieved by a search engine from domestic sites. In this study, “domestic websites” refer to those displaying an “ICP operating license” issued by Chinese authorities at the bottom of their homepages as websites publishing online content in China are required to register with the Ministry of Industry and Information Technology. Alternatively, research in other legal regimes may define “domestic” sites based on the locations of business registration or IP addresses. Previous studies have adopted anywhere from one query to tens of thousands as a sample (Jansen et al., 2007). This study uses 20 top Chinese Internet events from 2010 (see Table 1) and 15 randomly selected, generic terms as queries to mimic users' ambiguous search behavior. “Internet events” are events that attract public attention and spark public discussion. In China, online news consistently ranks as one of the most popular online activities. In 2013, it was the third most popular online activity and 78% Chinese Internet users, or 392 million, reported using the Internet to access news (CNNIC, 2013). Users search news and events often. A Chinese national survey on Chinese search engines (CNNIC, 2011) found news, video, and music are the top three types queried by users: 47.7, 45.2, and 41.6%, respectively. Table 1 Top 20 Chinese Internet events in 2010 as search queries Ranking . Event keyword . 1 Tencent vs. 360 Dispute 2 Shanghai Expo 3 Internet celebrity “Sister Feng” 4 Li Gang's Son's Drunken Hit Run Kills Student on Campus 5 Foxconn Employees Suicides Jumping from Dorms 6 Yuan Tengfei Remarks Spark Controversy 7 Beijing Shuts Down Night Club Passion 8 Guo Degang's Pupils Beat Reporters 9 Tang Jun “Diploma Gate” 10 Self-immolation in Huangyi Forced Relocation 11 Fang Zhouzi Mugged 12 Zhang Wuben Implicated in Fraudulent Promotion 13 Campus Attacks on Children Across Country 14 Authenticity of Anyang Cao Cao Tomb Questioned 15 Shanxi “Vaccination Scandal” 16 Shangqiu Zhao Zuohai's Wrongful Conviction Case 17 Wangjialing Coalmine Accident Rescue Efforts 18 Google Exit from China 19 Tang Fuzhen Self-immolation Incident 20 Worker Strikes in Some Areas Ranking . Event keyword . 1 Tencent vs. 360 Dispute 2 Shanghai Expo 3 Internet celebrity “Sister Feng” 4 Li Gang's Son's Drunken Hit Run Kills Student on Campus 5 Foxconn Employees Suicides Jumping from Dorms 6 Yuan Tengfei Remarks Spark Controversy 7 Beijing Shuts Down Night Club Passion 8 Guo Degang's Pupils Beat Reporters 9 Tang Jun “Diploma Gate” 10 Self-immolation in Huangyi Forced Relocation 11 Fang Zhouzi Mugged 12 Zhang Wuben Implicated in Fraudulent Promotion 13 Campus Attacks on Children Across Country 14 Authenticity of Anyang Cao Cao Tomb Questioned 15 Shanxi “Vaccination Scandal” 16 Shangqiu Zhao Zuohai's Wrongful Conviction Case 17 Wangjialing Coalmine Accident Rescue Efforts 18 Google Exit from China 19 Tang Fuzhen Self-immolation Incident 20 Worker Strikes in Some Areas Source: 2010 Chinese online public opinion analysis report (People's Net, 2010). Open in new tab Table 1 Top 20 Chinese Internet events in 2010 as search queries Ranking . Event keyword . 1 Tencent vs. 360 Dispute 2 Shanghai Expo 3 Internet celebrity “Sister Feng” 4 Li Gang's Son's Drunken Hit Run Kills Student on Campus 5 Foxconn Employees Suicides Jumping from Dorms 6 Yuan Tengfei Remarks Spark Controversy 7 Beijing Shuts Down Night Club Passion 8 Guo Degang's Pupils Beat Reporters 9 Tang Jun “Diploma Gate” 10 Self-immolation in Huangyi Forced Relocation 11 Fang Zhouzi Mugged 12 Zhang Wuben Implicated in Fraudulent Promotion 13 Campus Attacks on Children Across Country 14 Authenticity of Anyang Cao Cao Tomb Questioned 15 Shanxi “Vaccination Scandal” 16 Shangqiu Zhao Zuohai's Wrongful Conviction Case 17 Wangjialing Coalmine Accident Rescue Efforts 18 Google Exit from China 19 Tang Fuzhen Self-immolation Incident 20 Worker Strikes in Some Areas Ranking . Event keyword . 1 Tencent vs. 360 Dispute 2 Shanghai Expo 3 Internet celebrity “Sister Feng” 4 Li Gang's Son's Drunken Hit Run Kills Student on Campus 5 Foxconn Employees Suicides Jumping from Dorms 6 Yuan Tengfei Remarks Spark Controversy 7 Beijing Shuts Down Night Club Passion 8 Guo Degang's Pupils Beat Reporters 9 Tang Jun “Diploma Gate” 10 Self-immolation in Huangyi Forced Relocation 11 Fang Zhouzi Mugged 12 Zhang Wuben Implicated in Fraudulent Promotion 13 Campus Attacks on Children Across Country 14 Authenticity of Anyang Cao Cao Tomb Questioned 15 Shanxi “Vaccination Scandal” 16 Shangqiu Zhao Zuohai's Wrongful Conviction Case 17 Wangjialing Coalmine Accident Rescue Efforts 18 Google Exit from China 19 Tang Fuzhen Self-immolation Incident 20 Worker Strikes in Some Areas Source: 2010 Chinese online public opinion analysis report (People's Net, 2010). Open in new tab While the 20 events (see Table 1) used in this study are sanctioned by an official report as “top 20” (People's Net, 2010), the list is not as heavily censored as expected and contains in fact more controversial cases compared with the “top 10” lists published by major commercial portals like Sina or Tencent. With the exception of Shanghai Expo, the rest of the events on the official list are scandal-ridden, including prominent controversial events such as Li Gang scandal, Foxconn employees' suicides, Google's exit from China, and self-immolation incidents. In addition, following previous research using both generic and specific search terms to obtain more systemic results (Egan, MacLean, Sweeting, & Hunt, 2012), 15 generic short terms are randomly selected as queries from popular Chinese portal Sina's directory to balance the use of more specific and controversial Internet events as keywords. These 15 terms are: transportation, military, medicine, blog, entertainment, school, government, news, tourism, fashion, plane ticket, car, law, economy, and music. The brevity of these terms, which often introduces ambiguity in search calculations and complexity in results, mimics user behaviors. Moreover, the study archives search results of 35 queries a year apart in 2011 and 2012. Although this is not a longitudinal dataset strictly speaking, it is nonetheless an effective sampling procedure to capture search variation over time in China. After the query sample was picked, the author, a Chinese native, prepared an Excel file for data entry. The author and another Chinese researcher gathered data at the same time in August 2011 from the same location (i.e., router) in southern China. Queries were made using the 35 keywords to each of the three search engines in turn—Baidu (www.baidu.com), Google (www.google.com.hk), and Jike (www.jike.com). We each collected 1050 textual hyperlinks, and copied and pasted them into separate Excel spreadsheets. All data were handled in simplified Chinese. After checking our data for completeness, we also tested whether links were accessible and recorded the reason for inaccessibility. This data archival work was completed in one day. It is recognized that search opacity and personalization raise a number of issues for data collection. Results may vary by location, user, time, and other factors, thus making research efforts unreplicable (Feuz et al., 2011). Although datasets may vary due to search dynamism, systematic retrieval of search results can help probe the underlying search patterns that are not visible otherwise. To minimize external influences, we disabled cookies on our laptops and did not log into Baidu or Google accounts during data collection. Internet Explorer was set as the default browser. It was found, however, that if cookies were disabled in Google, searches would not turn up any results, whereas we did not experience the same issue with Baidu or Jike. Collecting data by two different researchers in 2011 was to gauge the extent of Baidu and Google's personalization at the time. We assessed results overlap, operationalized here narrowly as a URL appearing twice in the returned search results gathered by two different users for the same search engine (excluding differences in ranking due to limited resources to compute in semiautomatic methods). Results overlap is 90% for Baidu, 88% for Google, and 98% for Jike. While detailed discussions of this comparison are beyond the scope of this paper, these basic statistics indicate Google's personalization was the highest among the three, Jike's the lowest at the time. Given the relatively high degree of overlap, the researcher inside China repeated the data collection procedure in August 2012, producing another 1050 textual hyperlinks. For this study, only the Chinese researcher's data from 2011 to 2012 are used for analysis. Though it is impossible to control for all factors, the data sample captures snapshots of the three search engines in time, yielding valuable insight into the search engines' operations. The author and a Chinese-speaking student in the United States coded the data separately and created a new column to document the source of each search result (e.g., Baidu.com) in the Excel spreadsheet. Intercoder reliability, operationalized as a simple agreement between the two researchers' coding, is 98%. After resolving disagreements, we finalized the datasets. The study separated the data into four subdatasets to allow for meaningful comparisons: Top 20 Events in 2011, 15 General Terms in 2011, Top 20 Events in 2012, and 15 General Terms in 2012. We then identified the top 10 websites in each subdataset and recorded their frequencies (see Tables 2 and 3). Inaccessible links, recorded during data archiving, are singled out for qualitative analysis. To determine whether a website is “located” inside mainland China, we collected a website's registration information as a website within Chinese jurisdiction must display its operating license (Jiang, 2012). Although previous work (e.g., Liao, 2013) has used geo-IP to define the location of web content, it becomes clear in our coding process that a site's servers may be located overseas, but its legal jurisdiction overrides its geo IP in determining “location.” For instance, ifeng.com—the online branch of Hong Kong-based Phoenix New Media—registered with authorities in Beijing whereas Chinese Wikipedia operates from overseas. Table 2 Distribution of top 20 events search results among popular Chinese websites Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (22) Sina (13) Xinhua (13) Baidu (32) Sina (14) QQ (21) 2 Sina (21) Sohu (13) QQ (12) Sina (24) ifeng (11) Sina (20) 3 Sohu (18) QQ (10) Sohu (11) Sohu (18) QQ (10) Sohu (13) 4 QQ (15) NetEase (10) NetEase (10) NetEase (16) NetEase (10) Xinhua (9) 5 NetEase (15) ifeng (9) Sina (9) ifeng (11) Baidu (8) NetEase (8) 6 ifeng (9) People's Net (6) People's Net (4) QQ (9) Google (8) Youku (8) 7 Xinhua (6) Baidu (6) 360doc (4) Ku6 (5) Wikipedia (8) ifeng (5) 8 Ku6 (5) Huanqiu (5) Huanqiu (3) Huanqiu (4) Sohu (7) Baidu (5) 9 Huanqiu (4) Ku6 (4) ifeng (3) People's Net (3) People's Net (7) Huanqiu (4) 10 Youku (2) Youku (4) Gov.cn (2) Xinhua (3) Xinhua (5) Ku6 (4) Percentage of search results concentrated in top five websites 45.5% 30.6% 21.0% 50.5% 29.4% 35.5% Percentage of search results concentrated in top 10 websites 58.5% 44.4% 35.5% 62.5% 44.0% 47.5% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (3) Google (1) Google (0) Google (8) Google (0) Baidu BK (5) Baidu BK (1) Baidu BK (0) Baidu BK (7) Baidu BK (5) Baidu BK (2) Wikipedia (0) Wikipedia (2) Wikipedia (0) Wikipedia (0) Wikipedia (5) Wikipedia (1) Hudong (0) Hudong (1) Hudong (1) Hudong (0) Hudong (4) Hudong (3) Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (22) Sina (13) Xinhua (13) Baidu (32) Sina (14) QQ (21) 2 Sina (21) Sohu (13) QQ (12) Sina (24) ifeng (11) Sina (20) 3 Sohu (18) QQ (10) Sohu (11) Sohu (18) QQ (10) Sohu (13) 4 QQ (15) NetEase (10) NetEase (10) NetEase (16) NetEase (10) Xinhua (9) 5 NetEase (15) ifeng (9) Sina (9) ifeng (11) Baidu (8) NetEase (8) 6 ifeng (9) People's Net (6) People's Net (4) QQ (9) Google (8) Youku (8) 7 Xinhua (6) Baidu (6) 360doc (4) Ku6 (5) Wikipedia (8) ifeng (5) 8 Ku6 (5) Huanqiu (5) Huanqiu (3) Huanqiu (4) Sohu (7) Baidu (5) 9 Huanqiu (4) Ku6 (4) ifeng (3) People's Net (3) People's Net (7) Huanqiu (4) 10 Youku (2) Youku (4) Gov.cn (2) Xinhua (3) Xinhua (5) Ku6 (4) Percentage of search results concentrated in top five websites 45.5% 30.6% 21.0% 50.5% 29.4% 35.5% Percentage of search results concentrated in top 10 websites 58.5% 44.4% 35.5% 62.5% 44.0% 47.5% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (3) Google (1) Google (0) Google (8) Google (0) Baidu BK (5) Baidu BK (1) Baidu BK (0) Baidu BK (7) Baidu BK (5) Baidu BK (2) Wikipedia (0) Wikipedia (2) Wikipedia (0) Wikipedia (0) Wikipedia (5) Wikipedia (1) Hudong (0) Hudong (1) Hudong (1) Hudong (0) Hudong (4) Hudong (3) Note: The maximum number of search results for each column is 200 (exception: Google in 2011 and 2012 had only 180 due to Chinese Web filtering). In each column, websites are ranked based on frequencies of their appearances in returned results. “Baidu BK” denotes “Baidu Baike.” To assess variations among Baidu, Google, and Jike, analysis of variance (ANOVA) tests are conducted. Statistics for top five ranked websites in 2011 are F(2, 12) = 15.521, p = .0005 which is significant; in 2012, F(2, 12) = 3.307, p = .072. Statistics for top 10 ranked websites in 2011 are F(2, 27) = 2.098, p = .142; in 2012, F(2, 27) = .776, p = .470. To assess variations over time for each search engine, t tests are conducted. Results for top five ranked websites for Baidu, Google, and Jike respectively are: t(3) = 1.552, p = .218; t(3) = 0.397, p = 0.718; t(3) = 0.155, p = .189. Results for top 10 ranked sites for Baidu, Google, and Jike respectively are: t(8) = 0.594, p = .569; t(8) = 1.633, p = .137; t(8) = 2.259, p = .046 (α = .05), which is significant. Open in new tab Table 2 Distribution of top 20 events search results among popular Chinese websites Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (22) Sina (13) Xinhua (13) Baidu (32) Sina (14) QQ (21) 2 Sina (21) Sohu (13) QQ (12) Sina (24) ifeng (11) Sina (20) 3 Sohu (18) QQ (10) Sohu (11) Sohu (18) QQ (10) Sohu (13) 4 QQ (15) NetEase (10) NetEase (10) NetEase (16) NetEase (10) Xinhua (9) 5 NetEase (15) ifeng (9) Sina (9) ifeng (11) Baidu (8) NetEase (8) 6 ifeng (9) People's Net (6) People's Net (4) QQ (9) Google (8) Youku (8) 7 Xinhua (6) Baidu (6) 360doc (4) Ku6 (5) Wikipedia (8) ifeng (5) 8 Ku6 (5) Huanqiu (5) Huanqiu (3) Huanqiu (4) Sohu (7) Baidu (5) 9 Huanqiu (4) Ku6 (4) ifeng (3) People's Net (3) People's Net (7) Huanqiu (4) 10 Youku (2) Youku (4) Gov.cn (2) Xinhua (3) Xinhua (5) Ku6 (4) Percentage of search results concentrated in top five websites 45.5% 30.6% 21.0% 50.5% 29.4% 35.5% Percentage of search results concentrated in top 10 websites 58.5% 44.4% 35.5% 62.5% 44.0% 47.5% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (3) Google (1) Google (0) Google (8) Google (0) Baidu BK (5) Baidu BK (1) Baidu BK (0) Baidu BK (7) Baidu BK (5) Baidu BK (2) Wikipedia (0) Wikipedia (2) Wikipedia (0) Wikipedia (0) Wikipedia (5) Wikipedia (1) Hudong (0) Hudong (1) Hudong (1) Hudong (0) Hudong (4) Hudong (3) Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (22) Sina (13) Xinhua (13) Baidu (32) Sina (14) QQ (21) 2 Sina (21) Sohu (13) QQ (12) Sina (24) ifeng (11) Sina (20) 3 Sohu (18) QQ (10) Sohu (11) Sohu (18) QQ (10) Sohu (13) 4 QQ (15) NetEase (10) NetEase (10) NetEase (16) NetEase (10) Xinhua (9) 5 NetEase (15) ifeng (9) Sina (9) ifeng (11) Baidu (8) NetEase (8) 6 ifeng (9) People's Net (6) People's Net (4) QQ (9) Google (8) Youku (8) 7 Xinhua (6) Baidu (6) 360doc (4) Ku6 (5) Wikipedia (8) ifeng (5) 8 Ku6 (5) Huanqiu (5) Huanqiu (3) Huanqiu (4) Sohu (7) Baidu (5) 9 Huanqiu (4) Ku6 (4) ifeng (3) People's Net (3) People's Net (7) Huanqiu (4) 10 Youku (2) Youku (4) Gov.cn (2) Xinhua (3) Xinhua (5) Ku6 (4) Percentage of search results concentrated in top five websites 45.5% 30.6% 21.0% 50.5% 29.4% 35.5% Percentage of search results concentrated in top 10 websites 58.5% 44.4% 35.5% 62.5% 44.0% 47.5% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (3) Google (1) Google (0) Google (8) Google (0) Baidu BK (5) Baidu BK (1) Baidu BK (0) Baidu BK (7) Baidu BK (5) Baidu BK (2) Wikipedia (0) Wikipedia (2) Wikipedia (0) Wikipedia (0) Wikipedia (5) Wikipedia (1) Hudong (0) Hudong (1) Hudong (1) Hudong (0) Hudong (4) Hudong (3) Note: The maximum number of search results for each column is 200 (exception: Google in 2011 and 2012 had only 180 due to Chinese Web filtering). In each column, websites are ranked based on frequencies of their appearances in returned results. “Baidu BK” denotes “Baidu Baike.” To assess variations among Baidu, Google, and Jike, analysis of variance (ANOVA) tests are conducted. Statistics for top five ranked websites in 2011 are F(2, 12) = 15.521, p = .0005 which is significant; in 2012, F(2, 12) = 3.307, p = .072. Statistics for top 10 ranked websites in 2011 are F(2, 27) = 2.098, p = .142; in 2012, F(2, 27) = .776, p = .470. To assess variations over time for each search engine, t tests are conducted. Results for top five ranked websites for Baidu, Google, and Jike respectively are: t(3) = 1.552, p = .218; t(3) = 0.397, p = 0.718; t(3) = 0.155, p = .189. Results for top 10 ranked sites for Baidu, Google, and Jike respectively are: t(8) = 0.594, p = .569; t(8) = 1.633, p = .137; t(8) = 2.259, p = .046 (α = .05), which is significant. Open in new tab Table 3 Distribution of search results of 15 general terms among popular Chinese websites Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (43) NetEase (9) Gov.cn (9) Baidu (40) Google (8) Baidu (18) 2 Sina (6) Baidu (8) QQ (8) Gov.cn (11) Sina (7) Sina (8) 3 NetEase (5) Sina ((7) Sohu (8) Sina (6) QQ (7) QQ (7) 4 Hao123 (5) Sohu (7) ifeng (8) Huanqiu (5) NetEase (7) Hudong (7) 5 Sohu (5) People's Net (6) Trends (6) Hao123 (4) Huanqiu (6) Sohu (6) 6 QQ (4) Huanqiu (5) NetEase (5) Huanqiu (4) Sohu (6) NetEase (5) 7 ifeng (4) Hexun (5) Sina (5) ifeng (3) Baidu (5) Hexun (4) 8 Hexun (2) ifeng (4) Baidu (4) QQ (3) ifeng (4) Ifeng (3) 9 Huanqiu (2) Google (4) Hao123 (4) People's Net (3) Wikipedia (4) Gov.cn (3) 10 CNTV (2) Xinhua (2) ifeng (3) Xinhua (3) Hexun (3) Huanqiu (2) Percentage of search results concentrated in top five websites 42.7% 24.7% 29.3% 44.0% 23.3% 30.7% Percentage of search results concentrated in top 10 websites 52.0% 37.3% 43.3% 54.7% 38.0% 42.0% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (4) Google (1) Google (0) Google (8) Google (1) Baidu BK (14) Baidu BK (4) Baidu BK (0) Baidu BK (16) Baidu BK (2) Baidu BK (13) Wikipedia (0) Wikipedia (5) Wikipedia (0) Wikipedia (0) Wikipedia (4) Wikipedia (0) Hudong (0) Hudong (2) Hudong (1) Hudong (0) Hudong (4) Hudong (7) Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (43) NetEase (9) Gov.cn (9) Baidu (40) Google (8) Baidu (18) 2 Sina (6) Baidu (8) QQ (8) Gov.cn (11) Sina (7) Sina (8) 3 NetEase (5) Sina ((7) Sohu (8) Sina (6) QQ (7) QQ (7) 4 Hao123 (5) Sohu (7) ifeng (8) Huanqiu (5) NetEase (7) Hudong (7) 5 Sohu (5) People's Net (6) Trends (6) Hao123 (4) Huanqiu (6) Sohu (6) 6 QQ (4) Huanqiu (5) NetEase (5) Huanqiu (4) Sohu (6) NetEase (5) 7 ifeng (4) Hexun (5) Sina (5) ifeng (3) Baidu (5) Hexun (4) 8 Hexun (2) ifeng (4) Baidu (4) QQ (3) ifeng (4) Ifeng (3) 9 Huanqiu (2) Google (4) Hao123 (4) People's Net (3) Wikipedia (4) Gov.cn (3) 10 CNTV (2) Xinhua (2) ifeng (3) Xinhua (3) Hexun (3) Huanqiu (2) Percentage of search results concentrated in top five websites 42.7% 24.7% 29.3% 44.0% 23.3% 30.7% Percentage of search results concentrated in top 10 websites 52.0% 37.3% 43.3% 54.7% 38.0% 42.0% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (4) Google (1) Google (0) Google (8) Google (1) Baidu BK (14) Baidu BK (4) Baidu BK (0) Baidu BK (16) Baidu BK (2) Baidu BK (13) Wikipedia (0) Wikipedia (5) Wikipedia (0) Wikipedia (0) Wikipedia (4) Wikipedia (0) Hudong (0) Hudong (2) Hudong (1) Hudong (0) Hudong (4) Hudong (7) Note: The maximum number of search results for each column (e.g., Baidu 2011) is 150. In each column, websites are ranked according to the frequencies of their appearances in returned results. “Baidu BK” denotes “Baidu Baike.” To assess variations among Baidu, Google, and Jike, ANOVA tests are conducted. Statistics for top 5 ranked websites in 2011 are F(2, 12) = 0.472, p = .635; in 2012, F(2, 12) = 0.577, p = .577. Statistics for top 10 ranked websites in 2011 are F(2, 27) = 0.236, p = .791; in 2012, F(2, 27) = 0.331, p = .721. To assess variations over time for each search engine, t tests are conducted. Results for top 5 ranked websites for Baidu, Google, and Jike respectively are: t(3) = 0.454, p = .681; t(3) = 1.732, p = .182; t(3) = 0.721, p = .523. Results for top 10 ranked sites for Baidu, Google, and Jike, respectively are: t(8) = 0.459, p = .659; t(8) = .555, p = .594; t(8) = 0.411, p = .692. Open in new tab Table 3 Distribution of search results of 15 general terms among popular Chinese websites Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (43) NetEase (9) Gov.cn (9) Baidu (40) Google (8) Baidu (18) 2 Sina (6) Baidu (8) QQ (8) Gov.cn (11) Sina (7) Sina (8) 3 NetEase (5) Sina ((7) Sohu (8) Sina (6) QQ (7) QQ (7) 4 Hao123 (5) Sohu (7) ifeng (8) Huanqiu (5) NetEase (7) Hudong (7) 5 Sohu (5) People's Net (6) Trends (6) Hao123 (4) Huanqiu (6) Sohu (6) 6 QQ (4) Huanqiu (5) NetEase (5) Huanqiu (4) Sohu (6) NetEase (5) 7 ifeng (4) Hexun (5) Sina (5) ifeng (3) Baidu (5) Hexun (4) 8 Hexun (2) ifeng (4) Baidu (4) QQ (3) ifeng (4) Ifeng (3) 9 Huanqiu (2) Google (4) Hao123 (4) People's Net (3) Wikipedia (4) Gov.cn (3) 10 CNTV (2) Xinhua (2) ifeng (3) Xinhua (3) Hexun (3) Huanqiu (2) Percentage of search results concentrated in top five websites 42.7% 24.7% 29.3% 44.0% 23.3% 30.7% Percentage of search results concentrated in top 10 websites 52.0% 37.3% 43.3% 54.7% 38.0% 42.0% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (4) Google (1) Google (0) Google (8) Google (1) Baidu BK (14) Baidu BK (4) Baidu BK (0) Baidu BK (16) Baidu BK (2) Baidu BK (13) Wikipedia (0) Wikipedia (5) Wikipedia (0) Wikipedia (0) Wikipedia (4) Wikipedia (0) Hudong (0) Hudong (2) Hudong (1) Hudong (0) Hudong (4) Hudong (7) Rank . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . 1 Baidu (43) NetEase (9) Gov.cn (9) Baidu (40) Google (8) Baidu (18) 2 Sina (6) Baidu (8) QQ (8) Gov.cn (11) Sina (7) Sina (8) 3 NetEase (5) Sina ((7) Sohu (8) Sina (6) QQ (7) QQ (7) 4 Hao123 (5) Sohu (7) ifeng (8) Huanqiu (5) NetEase (7) Hudong (7) 5 Sohu (5) People's Net (6) Trends (6) Hao123 (4) Huanqiu (6) Sohu (6) 6 QQ (4) Huanqiu (5) NetEase (5) Huanqiu (4) Sohu (6) NetEase (5) 7 ifeng (4) Hexun (5) Sina (5) ifeng (3) Baidu (5) Hexun (4) 8 Hexun (2) ifeng (4) Baidu (4) QQ (3) ifeng (4) Ifeng (3) 9 Huanqiu (2) Google (4) Hao123 (4) People's Net (3) Wikipedia (4) Gov.cn (3) 10 CNTV (2) Xinhua (2) ifeng (3) Xinhua (3) Hexun (3) Huanqiu (2) Percentage of search results concentrated in top five websites 42.7% 24.7% 29.3% 44.0% 23.3% 30.7% Percentage of search results concentrated in top 10 websites 52.0% 37.3% 43.3% 54.7% 38.0% 42.0% Number of search results returned for Google, Baidu BK, Wikipedia, and Hudong in Baidu, Google, and Jike Google (0) Google (4) Google (1) Google (0) Google (8) Google (1) Baidu BK (14) Baidu BK (4) Baidu BK (0) Baidu BK (16) Baidu BK (2) Baidu BK (13) Wikipedia (0) Wikipedia (5) Wikipedia (0) Wikipedia (0) Wikipedia (4) Wikipedia (0) Hudong (0) Hudong (2) Hudong (1) Hudong (0) Hudong (4) Hudong (7) Note: The maximum number of search results for each column (e.g., Baidu 2011) is 150. In each column, websites are ranked according to the frequencies of their appearances in returned results. “Baidu BK” denotes “Baidu Baike.” To assess variations among Baidu, Google, and Jike, ANOVA tests are conducted. Statistics for top 5 ranked websites in 2011 are F(2, 12) = 0.472, p = .635; in 2012, F(2, 12) = 0.577, p = .577. Statistics for top 10 ranked websites in 2011 are F(2, 27) = 0.236, p = .791; in 2012, F(2, 27) = 0.331, p = .721. To assess variations over time for each search engine, t tests are conducted. Results for top 5 ranked websites for Baidu, Google, and Jike respectively are: t(3) = 0.454, p = .681; t(3) = 1.732, p = .182; t(3) = 0.721, p = .523. Results for top 10 ranked sites for Baidu, Google, and Jike, respectively are: t(8) = 0.459, p = .659; t(8) = .555, p = .594; t(8) = 0.411, p = .692. Open in new tab Results Search concentration Analysis shows a high percentage of search results come from a handful of Chinese commercial websites and to a lesser degree from dominant state websites (see Tables 2 and 3). This pattern is confirmed by Alexa's ranking of top Chinese sites (2013). The distribution of search results of Top 20 Events and the 15 General Terms reveals that the top five websites populating the first-page search results in Baidu, Google, and Jike are Chinese Internet giants. For instance, in 2012, the top five websites—Baidu, Sina, Sohu, NetEase, and ifeng—contributed as much as 50.5% of Baidu's first-page search results (out of a maximum of 200 results) for Top 20 Events (see Table 2). Top 10 websites contributed 62.5% of results in Baidu. Although such a high degree of search concentration is skewed by Baidu's consistent display of its own content, the domination by top five websites or top 10 websites in Google and Jike's results is evident, varying between 21–36% and 36–48%, respectively. To further gauge the significance of variations of search concentration between Baidu, Google, and Jike, analysis of variance (ANOVA) tests were performed. In the results for Top 20 Events (see Table 2 notes for detail), ANOVA tests showed search concentration among top five ranked sites in Baidu was significantly more pronounced than Google and Jike in 2011, F(2, 12) = 15.521, p < .001, but variations between the three search engines were not significant in 2012 or among top 10 sites in 2011 or 2012. In the results for 15 General Terms (see Table 3 notes for detail), ANOVA tests detect no significant differences between the three search engines in search concentration in either top five or top 10 sites. The diminished variation from 2011 to 2012 for Top 20 Events is caused mainly by an increased degree of search concentration in Jike's results (see Table 2). While Baidu features prominently in its own results, yielding a high degree of search concentration and variation for the top spot, variations between Baidu, Google, and Jike for the rest of the top 10 spots are much more diminutive, thus reducing the overall degrees of variation between the three search engines. It is worth noting that while variations of search concentration between these search engines may not always be significantly large, the degree of concentration for each search engine remained relatively high. Over time, patterns of search concentration for the three search engines differ. Between 2011 and 2012, the degree of concentration increased slightly for Baidu, remained stable for Google, and shifted for Jike based on types of queries. With the exception of the top 10 websites retrieved for Top 20 Events in Jike, t(8) = 2.259, p < .05 (α = .05), t tests do not find significant variations in search concentration for the three search engines over time in Top 20 Events or 15 General Terms (see Tables 2 and 3). Jike's search concentration for Top 20 Events queries reflected in top 10 websites went from 35.5 to 47.5% between 2011 and 2012, while that for 15 General Terms remained relatively stable. Results from state websites such as Xinhua News Agency (state news agency), Huanqiu (or Global Times), and People's Net (People's Daily online) appear more often in Jike compared to the other two search engines, and more so in 2011 than 2012 (see Tables 2 and 3). Overall, state sites are not nearly as popular as commercial ones. Moreover, search concentration is slightly more pronounced in the search results for queries based on news stories than those based on general terms, perhaps due to the fact that most top sites—Sina, Sohu, NetEase, and ifeng—are news portals too (Alexa, 2013). Baidu has also developed products such as its news aggregation service Baidu News and user-generated content sites: Baidu Baike (Wikipedia-like), Baidu Zhidao (Baidu Knows for Q&A), and Baidu Tieba (a community posting service). Overall, search concentration is a visible issue in the Chinese search market, however, it varies in degree and over time for different search engines. Search bias Among the three search engines, Baidu seems most susceptible to bias charges. It consistently includes its own content to a disproportionate degree when compared to other search engines. In 2011, 22 search results (or 11% from Baidu for Top 20 Events) are from Baidu's own services, compared to six links to Baidu content in Google's results and 0 in Jike's (Table 2). In 2012, links to Baidu's own content grow to 32 (or 16%), compared to eight in Google and zero in Jike. Notably, in the dataset for 15 General Terms (Table 3), Baidu's content (including Hao123.com, a directory owned by Baidu) appears 48 times (or 32%) in 2011, compared to only eight in Google's results and 5 in Jike's. In 2012, 44 search results (or 29.3%) in Baidu are from websites owned by Baidu, compared to 8 in Google and 18 in Jike. While Baidu is the No.1 website in China, the frequent appearance of its own content in its results can reinforce its dominant power over time via personalization, raising serious doubts about Baidu's impartiality. Further, Baidu Baike, a Wikipedia-like service, is featured frequently in the search results of six different event queries. For instance, in Baidu's search results for “Shanghai Expo,” Baidu Baike's description of “Shanghai Expo” is ranked in the No. 1 spot above the official Shanghai Expo website, whereas both Google and Jike place the official Expo site in the top spot. Another Baidu service, Baidu Zhidao (Q&A service), appears in search results in 4 of the 20 events. In contrast, the content of Baidu's rivals such as Google, Wikipedia, and Hudong Baike does not appear even once in any of Baidu's search results (see Table 2). Moreover, Baidu's inclusion of its own content is much more striking when search queries are based on general terms rather than specific news stories. Baidu products are twice to thrice more likely to appear in fuzzy, general search queries (32% in 2011 and 29.3% in 2012) than specific news keywords (11% in 2011 and 16% in 2012). In the 2011 dataset for 15 General Terms, Baidu Baike appears in the first page of search results for every query except the one for “plane ticket,” which is filled by ads from travel agencies. In an extreme case of a query “school” in 2011, Baidu's own content appears in six of the top 10 search results: Baidu Baike's entry about “school” in the top spot, Baidu Image with a collage of “school” pictures in the second spot, Baidu Tieba's discussion forum snippets in the third spot, Baidu Map depicting local schools in the fifth spot, Baidu News about “schools” in the ninth spot, and finally another Baidu Baike entry of a local university. Baidu's content seems to dominate the first-page results and user attention. Google, on the other hand, does not seem to include its own content disproportionately. In the 2011 dataset for Top 20 Events, Google's own content appears only three times. In the 2012 dataset for Top 20 Events, Google's presence is more visible with eight appearances, on par with the number of links to Baidu content. This trend re-emerges in the dataset for 15 General Terms, where Google's presence in its own search results grows from four links in 2011 to eight links in 2012, the latter topping the chart as the website receiving most links. But Google's “own-content bias” is nowhere close to Baidu's. For the comparable dataset, Baidu includes 40 links to its own content (see Table 3). The increasing presence of search engines' own content may be partially attributable to search engines' grouping of their own content in news, image, and video formats. Moreover, in the space of online encyclopedia services, while Baidu seems to default to Baidu Baike and exclude Hudong Baike and Chinese Wikipedia entirely, Google seems to have given all three almost an equal chance. In the Top 20 Events dataset for 2011, Google references the Chinese Wikipedia twice, and Baidu Baike and Hudong Baike each once. In 2012, Baidu Baike and Wikipedia receive equal hits of five times while Hudong has four (see Table 2). In the 15 General Terms dataset for 2011 and 2012, Wikipedia is only slightly more prominent than either Baidu Baike or Hudong Baike (see Table 3). Overall, Google's preference for Chinese Wikipedia does not seem obvious in the space of Chinese online encyclopedia services. Jike's bias patterns are intriguing. In 2011, state news site Xinhua News Agency tops the chart with 13 appearances for Top 20 Events. National Gov.cn sites appear most frequently in the 15 General Terms dataset (see Tables 2 and 3). Jike's “self-content bias” is evident when Jike's results are contrasted with Baidu or Google's results. Yet in 2012, state websites' presence is greatly reduced: Xinhua News Agency ranks No. 4 in Top 20 Events dataset and Gov.cn ranks No. 9 in the 15 General Terms dataset. Although Baidu is consistently listed as top site in either Baidu or Google's results, its presence is either nonexistent or feeble in Jike's own results in 2011. But Baidu's presence in Jike improved considerably over time (see Tables 2 and 3). In 2012, Jike seems to have adjusted its ranking algorithm to reduce blatant “other-content bias.” Search parochialism Overall, Google is more likely to deliver hyperlinks to overseas sites than Baidu or Jike. In the 2011 dataset for Top 20 Events, Google has 14 hyperlinks, or 7%, to foreign sites whereas Baidu has only two and Jike four (see Table 4). Those 14 links are of 10 sources, including two links to Google's own aggregated search results, two to Wikipedia, two to Epoch Times (a U.S.-based Chinese site banned in mainland China), and links to sites like Google Hong Kong, Blogspot, YouTube (both banned in mainland China), Zaobao (mainstream Singaporean news outlet) as well as Chinese sites in Taiwan, Germany, and Canada. ANOVA tests confirm that for Top 20 Events in 2011, Google is significantly more likely than Baidu or Jike to send users to foreign content, F(2, 27) = 10.731, p < .001. This diverse range of foreign websites in Google, however, changed in 2012 when Wikipedia alone appeared eight times. The differences among Baidu, Google, and Jike in 2012 are not found to be statically significant. The same pattern occurs in the dataset for 15 General Terms, where ANOVA tests show variations among the three search engines are in 2011 significant, F(2, 3) = 36.5, p < .01, but not in 2012. Variations over time for Google were significant, t(8) = 6, p < .001, but not for Baidu or Jike. Table 4 Distribution of overseas search results from Baidu, Google, and Jike* Top 20 Events . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . WSJ CN (2) Google (2) Zaobao (2) WSJ CN (1) Wikipedia (8) Zaobao (1) Wikipedia (2) WSJ CN (1) Google (1) Wikipedia (1) Epoch Times (2) Google (1) Canadian (1) FT CN (1) Taiwan (2) Google HK (1) Blogspot (1) Youtube (1) Zaobao (1) German (1) Canadian (1) 1% 7% 2% 0.5% 5% 1.5% 15 general terms 0 Google (4) Google (1) 0 Google (8) Google (1) Wikipedia (5) Wikipedia (4) 0% 6% 0.7% 0% 8% 0.7% Top 20 Events . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . WSJ CN (2) Google (2) Zaobao (2) WSJ CN (1) Wikipedia (8) Zaobao (1) Wikipedia (2) WSJ CN (1) Google (1) Wikipedia (1) Epoch Times (2) Google (1) Canadian (1) FT CN (1) Taiwan (2) Google HK (1) Blogspot (1) Youtube (1) Zaobao (1) German (1) Canadian (1) 1% 7% 2% 0.5% 5% 1.5% 15 general terms 0 Google (4) Google (1) 0 Google (8) Google (1) Wikipedia (5) Wikipedia (4) 0% 6% 0.7% 0% 8% 0.7% Note: The maximum number of search results for each column in Top 20 Events section is 200 (exception: Google in 2011 and 2012 had only 180 due to Chinese Web filtering) and maximum number of search results for the 15 General Search Terms is 150. To assess variations among Baidu, Google, and Jike, ANOVA tests are conducted. Statistics for “Top 20 Events” in 2011 are F(2, 27) = 0.731, p = .0004, which is significant. The statistics for 2012 are F(2, 6) = 1.34, p = .330. Statistics for “15 General Terms” in 2011 are F(2, 3) = 36.5, p = .008 (<0.01), which is significant; in 2012, F(2, 3) = 7.824, p = .065. To assess variations over time for search engines, t tests are conducted. For “20 Top Events,” results for Google and Jike are respectively: t(8) = 6, p = .0003, which is significant; and t(1) = 1, p = .500. Open in new tab Table 4 Distribution of overseas search results from Baidu, Google, and Jike* Top 20 Events . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . WSJ CN (2) Google (2) Zaobao (2) WSJ CN (1) Wikipedia (8) Zaobao (1) Wikipedia (2) WSJ CN (1) Google (1) Wikipedia (1) Epoch Times (2) Google (1) Canadian (1) FT CN (1) Taiwan (2) Google HK (1) Blogspot (1) Youtube (1) Zaobao (1) German (1) Canadian (1) 1% 7% 2% 0.5% 5% 1.5% 15 general terms 0 Google (4) Google (1) 0 Google (8) Google (1) Wikipedia (5) Wikipedia (4) 0% 6% 0.7% 0% 8% 0.7% Top 20 Events . Baidu 2011 . Google 2011 . Jike 2011 . Baidu 2012 . Google 2012 . Jike 2012 . WSJ CN (2) Google (2) Zaobao (2) WSJ CN (1) Wikipedia (8) Zaobao (1) Wikipedia (2) WSJ CN (1) Google (1) Wikipedia (1) Epoch Times (2) Google (1) Canadian (1) FT CN (1) Taiwan (2) Google HK (1) Blogspot (1) Youtube (1) Zaobao (1) German (1) Canadian (1) 1% 7% 2% 0.5% 5% 1.5% 15 general terms 0 Google (4) Google (1) 0 Google (8) Google (1) Wikipedia (5) Wikipedia (4) 0% 6% 0.7% 0% 8% 0.7% Note: The maximum number of search results for each column in Top 20 Events section is 200 (exception: Google in 2011 and 2012 had only 180 due to Chinese Web filtering) and maximum number of search results for the 15 General Search Terms is 150. To assess variations among Baidu, Google, and Jike, ANOVA tests are conducted. Statistics for “Top 20 Events” in 2011 are F(2, 27) = 0.731, p = .0004, which is significant. The statistics for 2012 are F(2, 6) = 1.34, p = .330. Statistics for “15 General Terms” in 2011 are F(2, 3) = 36.5, p = .008 (<0.01), which is significant; in 2012, F(2, 3) = 7.824, p = .065. To assess variations over time for search engines, t tests are conducted. For “20 Top Events,” results for Google and Jike are respectively: t(8) = 6, p = .0003, which is significant; and t(1) = 1, p = .500. Open in new tab Interestingly, unlike Google, both Baidu and Jike have occasionally, perhaps accidentally, included in their first-page results for Top 20 Events links to the Chinese versions of major foreign newspapers such as Wall Street Journal and Financial Times. These sites, hosted outside China, are beyond Chinese jurisdiction but they can also be easily blocked by China's Great Firewall should Beijing decide these newspapers pose threat to its legitimacy and rule. In addition, these foreign newspapers are typically viewable only through subscription. Although some subscriptions are not fee-based and require only registration, registration by itself is cumbersome enough to turn away potential Chinese readers. For instance, both Baidu and Jike link to a Wall Street Journal story about Google's exit from China in 2011. Ironically, because registration is required, the story is not easily accessible to most Chinese-speaking users. In addition, analysis shows that links to foreign sites rarely show up in fuzzy, general search terms. For Baidu, its first-page search results for 15 General Terms are often occupied by webpages of major domestic commercial sites or even its own content. For Google, foreign links tend to lead to Google's own content as well as Chinese Wikipedia, both hosted outside China (see Table 4). Lastly, search parochialism is influenced by web filtering and blocking practices as well. In the study's datasets, Google's content for certain search queries is blocked entirely. In both 2011 and 2012, the query in Google for “Li Gang's Son Drunken Hit Run Kills Student on Campus” turns up no results. Instead, users' search attempts were interrupted for roughly 5 minutes before search activities could resume. The other blocked query is “Wangjialing Coalmine Accident Rescue Efforts” in the 2012 dataset for Top 20 Events. Discussion Given the centrality of web search in people's everyday lives, a growing body of research (e.g., Halavais, 2008; Introna & Nissenbaum, 2000; Pariser, 2011; Spink & Zimmer, 2008) has focused on the political, economic, and cultural impact of search. There is also growing concern that search engines, like legacy media, may exert undue influence through mainstreaming, hyper-commercialism, and consolidation (Diaz, 2008). So far, little research exists on Chinese search engines. This study probes Baidu, Google, and Jike's tendencies of search concentration, bias, and parochialism in China, drawing researchers, public, and policymakers' attention to patterns and characteristics of privilege associated with search engines (Rogers, 2013). Search concentration This paper offers some empirical evidence of search concentration in China. First-page (with a default of 10) search results are dominated by a few Chinese commercial Internet giants and to a lesser degree by major state websites. In extreme cases, top five Chinese websites can contribute as much as 50.5% of first-page search results and top 10 websites as much as 62.5%. This tendency is especially visible in Baidu. The finding is corroborated by Liao's (2013) assessment that nearly 80% of search results for Baidu, Google, and Yahoo across mainland China, Taiwan, Hong Kong, and Singapore terminate within 100 websites. Moreover, search concentration seems slightly more pronounced for queries based on news stories than those based on general terms. The consolidation of search business could exacerbate search concentration because without effective regulation or self-regulation, dominant players could favor their own content and lower search quality (Lianos & Motchenkova, 2013). This study's search concentration analysis also suggests that search giants in China may be mirroring and reinforcing the monopolistic presence of new digital empires, whether they are commercially owned or operated by the Chinese state. The issue here is not so much one of “presence” (having Internet access to post content and air opinions), but “visibility” (relevant content having a fair chance to be discovered), although China's one-party dominance makes the “presence” of certain political/cultural content difficult in the first place. In actuality, search concentration can make commercial and state filtering of Web content much easier when the government's strategy to “seize the big, release the small” can effectively discipline major Chinese commercial websites, and along with them, user attention and access to information. Moreover, contrary to the assumption that the Chinese state exerts powerful influence over the Chinese Internet, state search engine Jike's reach pales in comparison with commercial giant Baidu in China and Google overseas. While authorities could block a global player like Google, meager user adoption and financial woes have prevented Jike from becoming a popular digital platform at home. Jike's service to authorities also declined from 2011 to 2012 as the presence of state media sources waned in search results to avoid overt “own-content” bias. As a new form of media mainstreaming, search engines like Baidu and Google represent the transference of “media monopoly” from traditional media empires to new ones (Diaz, 2008). Previously, scholars have censured media conglomeration to a dwindling number of corporations (Bagdikian, 2004). In the digital age, monopoly persists with the likes of Google, Facebook, Twitter as well as China's Baidu, Renren, Sina Weibo, and Taobao. Search engines like Google and Baidu have become increasingly vertically integrated, functioning as a search engine, an advertising agency, and a ratings system simultaneously (Lee, 2011). Just as the consolidation of search business should remain a concern for regulators of the search market, the concentration of search results in a few Internet giants should also become an issue of public and regulatory dialogue. Driven by the rich-get-richer logic, search concentration privileging popularity over quality may not disappear with search personalization as predicted (e.g. Goldman, 2006). Instead, search concentration could be personalized as well. Given Baidu and Google were practicing “search personalization” in China at the time of data collection, the study suggests search personalization has not reversed search concentration, at least in China. Search bias Compounding the adverse effects of search concentration is search bias. Lately, both Baidu and Google have become the target of unfair competition charges (Edelman, 2011). Meanwhile, little empirical work has examined the extent to which search engines might favor their own content and discriminate others. This study offers some evidence that Baidu disproportionately presents its own content and discriminates its rivals'. Its noncompetitive behavior is especially visible in online encyclopedia services, where rivals Hudong Baike and Chinese Wikipedia are absent from Baidu's first-page results. These results are echoed by recent research that shows in Baidu's results, Baidu Baike is the most visible across various types of queries (Liao, 2013). It also affirms previous findings that Baidu rarely links to its rivals Hudong Baike or Chinese Wikipedia (Jiang, 2014). It seems Baidu Baike has benefited from Baidu's monopolistic status by boosting its own popularity to push its competitors to oblivion. Some may object to such charges of bias, arguing that search ranking by definition has to discriminate (Grimmelmann, 2010) and that search bias is necessary and coproduced by users (Goldman, 2006). It is true that search engines cannot passively redistribute content from spammers and fraudsters (Goldman, 2006). Bias in the form of editorial judgment to prevent anarchy and preserve credibility is needed. But it is possible for search engines to abuse their power. Baidu, for instance, was exposed to have demoted competitors, distributed false information, and bullied advertisers (CCTV, 2011). Defending search engines' right to exercise editorial control should not be mistaken with exposing search engines' abuse of its power that harms advertisers or users. Although search engines can be biased by choice or by accident, intentional bias is hard to prove given the myriad factors involved in ranking and relevance computation. Moreover, search results are currently interpreted by U.S. courts as “opinions” and free speech “entitled to full constitutional protection” under the First Amendment (Bracha & Pasquale, 2008, p. 1151), despite the fact that search companies often unequivocally claim algorithmic objectivity. It is important to recognize that search engines are not infallible. They should not be given an easy pass simply because their editorial work seems good most of the time. Rather, it is precisely because they are so critical to users in defining a sense of reality that we should try to hold them accountable for potential power abuses because the consequences of not doing so are graver. Search parochialism Contrary to the popular belief of search engines as global media, able to fetch the best information at the user's beck and call, more research suggests user search experience has become increasingly parochial, or “less placeless” (Rogers, 2013). The study shows that search engines rarely provide links to overseas content for queries inside mainland China. Among the three, Google is the least parochial. However, at its best, Google only provides 8% of links to foreign content (see Table 4) and deplorably only mostly to Google's own sites and Chinese Wikipedia. Diversity of foreign content provided by Google also decreased, for instance, from 11 foreign sources in 2011 to only three in 2012 for Top 20 Events. On the other hand, both Baidu and Jike have occasionally included in their first-page results links to Chinese editions of well-known foreign newspapers such as Wall Street Journal and Financial Times, perhaps due to their value to the Chinese business community. But so rarely are links to these sites included that the impact of overseas Chinese content may be negligible. Various historical, structural, and political reasons might have contributed to the invisibility of overseas Chinese-language content to queries from mainland China: Overseas Chinese sites historically may not be well known or well linked; search engines' crawling and indexing practices may have neglected overseas Chinese-language sites; domestic content dwarfs overseas Chinese sites, the latter ranked poorly due to limited traffic; and importantly, Beijing bans many overseas Chinese sites (e.g., MIT BBS) for fear of overseas political subversion. These findings are supported by recent research that finds queries into Google and Yahoo from Taiwan and Hong Kong are much more likely to yield hyperlinks to U.S.-based Chinese-language content than those in Singapore or mainland China (Liao, 2013). It implies that both mainland China and Singapore have more restrictive “virtual borders” and search operations while those in Taiwan and Hong Kong are much more porous and liberal. In the Chinese case, accessibility to overseas Chinese-language sites is entirely subject to the political conditions of the time. Because these websites operate outside Chinese jurisdictions, they can be easily blocked by Chinese authorities. This study shows that some of mainland Chinese users' queries to Google have been blocked. Furthermore, state-sponsored search engines like Jike, which are fundamentally nationalistic in nature, are even less likely to promote cosmopolitan outlooks. Although the Internet and Web search have become more ubiquitous, people's experiences of them have not necessarily become more cosmopolitan. Contrarily, technological and commercial push for localization may draw users' attention even closer to home than directing them outward (Pariser, 2011). Through personalization, searchers are more “embodied” than “disembodied” (Rogers, 2013), experiencing “re-placement” rather than “displacement” (Stalder & Mayer, 2009). In an age of globalization, the local is ironically reclaiming its place in our everyday life. This is not to say localization is bad. However, if search engines do nothing more than deliver familiar, localized search results to their users, search engines' cosmopolitan potential is seriously undermined. When searchers are bounded by “the local,” including its political arrangement, cultural content, and ideological outlook, the diffusion of information, knowledge, and ideas globally through borderless search drifts further away. Methodological reflections The dynamic nature of search poses major methodological challenges. It requires researchers to employ new digital methods by “thinking along” with, recombining and exploiting digital objects like algorithms to gain social, political, and cultural insight (Rogers, 2013). This study does so by investigating search results retrieved from multiple search engines in China. The author cautions, however, that research findings, while serving as a useful baseline, may not be generalizable. First, the sample is small, with an emphasis on prominent news stories and fuzzy, general search terms, which may have amplified search concentration, bias, and parochialism. Second, the study did not repeat the two-person data collection procedure in 2012 as in 2011, thus diminishing its reliability. Third, the enormous size of the Web as well as search engines' dynamic crawling, indexing, and ranking operations further complicate sampling procedures. Personalization poses particular challenges to generalized research findings. What are captured are two snapshots of the three search engines in time from a particular searcher. However, the study did include some remedial measures (e.g., comparing search results by two researchers sharing the same IP address). Future studies may employ new computing equipment or data triangulation from multiple users and locations at the same time. Concluding remarks Previous research has examined aspects of search concentration, bias, and parochialism, but few delved into important search markets such as China. This study fills this research gap. It finds search engines in China, particularly Baidu, tend to drive traffic to well-established sites. Baidu's results also raise serious questions of its impartiality. Rather than making users' search experiences more cosmopolitan, search engines rarely direct Chinese users to content beyond national borders. Consequently, user experiences have become more parochial. While this study has made a start in probing issues of search concentration, bias, and parochialism in China, large research gaps still exist. Future studies could be on a larger scale to include bigger and more diverse sets of query samples. Automated archiving mechanisms and more sophisticated measures could be adopted to account for aspects of search personalization. Further, more could be done to learn about user awareness of, attitudes toward, and means of coping with search concentration, bias, and parochialism. The actual influence of search concentration, bias, and parochialism on user knowledge and perception is also likely to gain import. In particular, Baidu's performance related to search concentration, bias, and parochialism as well as its impact on the user is worth tracking in a more extensive, long-term manner to understand their evolving nature. Almost without doubt, the ongoing antitrust case brought against Baidu by Hudong Baike will push forward investigations in search monopoly and search bias in unprecedented ways. In addition, the operation of state-sponsored search engine Jike will surely receive more public and scholarly attention as the state digital propaganda apparatus extends from websites to network information and technologies. Beyond China, questions of search concentration, bias, and parochialism could also be extended to other countries and regions to increase accountability and devise alternatives. Henry David Thoreau warned: “Our inventions are wont to be pretty toys, which distract our attention from serious things. They are but improved means to an unimproved end.” These prescient words are a reminder of the need to probe the increasingly complex and opaque digital mechanisms on which our mediated lives depend. Their impact on Chinese users and society deserves much more scholarly scrutiny in the future. Acknowledgment The author wishes to thank her Chinese research colleague, her colleagues Linda Shanock, Qingfang Wang, and Xingjian Liu at UNC Charlotte as well as Journal of Communication's Editor-in-Chief Malcolm Parks, Associate Editor Jack Qiu, and three anonymous reviewers for their assistance and comments. References Alexa . ( 2013 ). Top sites in China . Retrieved from http://is.gd/zUKbqg Baeza-Yates , R. , Saint-Jean , F., & Castillo , C. ( 2002 ). Web structure, age and page quality. Paper presented at the 2nd International Workshop on Web Dynamics (WebDyn 2002). Retrieved from http://is.gd/QH9l0M Bagdikian , B. ( 2004 ). The new media monopoly . Boston, MA : The Beacon Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Bracha , O. , & Pasquale , F. ( 2008 ). Federal search commission? Access, fairness and accountability in the law of search . Cornell Law Review , 93 , 1149 – 1209 . OpenURL Placeholder Text WorldCat Brin , S. , & Page , L. ( 1998 ). The anatomy of a large-scale hypertextual web search engine . Retrieved from http://infolab.stanford.edu/˜backrub/google.html CCTV . ( 2011 ). Baidu, what shut your eyes? Retrieved from http://bit.ly/mSxdTm China Internet Network Information Center (CNNIC) ( 2011 ). 2011 Chinese search engine market research report. Retrieved from: http://is.gd/y4CfMN China Internet Network Information Center (CNNIC) . ( 2013 ). 32nd statistical survey report on internet development in China . Retrieved from http://is.gd/YtE6EP Cho , J. & Roy , S. ( 2004 ). Impact of search engines on page popularity. Proceedings of the WWW2004 Conference, May 17-24, 2004, New York. Retrieved from http://is.gd/8kGf2D Cho , J. , Roy , S., & Adams , R. ( 2005 ). Page quality: In search of an unbiased web ranking. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (pp. 551-562). Baltimore, Maryland, USA, June 14-16, 2005. CNZZ . ( 2013 ). Search engine market share analysis report (beta) for August 1, 2013–August 31, 2013. Retrieved from http://is.gd/ms50fF Diaz , A . ( 2008 ). Through the Google goggles: Sociopolitical bias in search engine design. In A. Spink & M. Zimmer (Eds.), Web search: Multidisciplinary Perspectives (pp. 11 – 34 ). Berlin : Springer . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Edelman , B. ( 2011 ). Bias in search results? Diagnosis and response . The Indiana Journal of Law and Technology , 7 , 16 – 32 . OpenURL Placeholder Text WorldCat Egan , M. , MacLean , A., Sweeting , H., & Hunt , K. ( 2012 ). Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic review . BMJ Open , 12 ( 2 ). doi:10.1136/bmjopen-2012-001043. OpenURL Placeholder Text WorldCat Feuz , M. , Fuller , M., & Stalder , F. ( 2011 ). Personal web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalization. First Monday, 16(2). Retrieved from http://firstmonday.org/article/view/3344/2766 Finley , K . (February 7, 2011 ). Was Eric Schmidt wrong about the historical scale of the Internet? ReadWrite . Retrieved from http://is.gd/WtiY8X Goldman , E. ( 2006 ). Search engine bias and the demise of search engine utopianism . Yale Journal of Law and Technology , 8 , 188 – 200 . doi:10.1007/978-3-540-75829-7_8. OpenURL Placeholder Text WorldCat Goldsmith , J. , & Wu , T. ( 2006 ). Who controls the Internet? Illusions of a borderless world . Oxford, England : Oxford University Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Goodwin , D. (August 12, 2013 ). Google's Patrick Thomas talks controversial content ahead of SES San Francisco keynote. Search Engine Watch . Retrieved from http://is.gd/QAAKjj Grimmelmann , J. ( 2010 ). Some skepticism about search neutrality. In B. Szoka & A. Marcus (Eds.), The next digital decade (pp. 435 – 459 ). Washington, DC : TechFreedom . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Halavais , A. ( 2008 ). Search engine society . Cambridge, England : Polity Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Hargittai , E. ( 2000 ). Open portals or closed gates? Channeling content on the World Wide Web . Poetics , 27 ( 4 ), 233 – 254 . Google Scholar Crossref Search ADS WorldCat Hindman , M. , Tsioutsiouliklis , K., & Johnson , J. ( 2003 ). “Googlearchy”: How a few heavily-linked sites dominate politics on the Web . Retrieved from http://is.gd/m4d24W Internet Live Stats . ( 2014 ). Google search statistics. Retrieved from http://is.gd/2A8AzB Introna , L. , & Nissenbaum , H. ( 2000 ). Shaping the web: Why the politics of search engines matter . The Information Society , 16 , 1 – 17 . Google Scholar Crossref Search ADS WorldCat iResearch . ( 2013 ). China search engine revenues surge to 9.28 Bn yuan in Q2 2013. Retrieved from http://www.iresearchchina.com/views/5052.html Jansen , J. , Spink , A., & Koshman , S. ( 2007 ). Web searcher interaction with the Dogpile.com metasearch engine. Journal of the American Society for Information Science and Technology , 58 ( 5 ), 744 – 755 . doi:10.1002/asi.20555. Crossref Search ADS Jiang , M. ( 2012 ). Internet companies in China: Dancing between the party line and the bottom line. Asie Visions, 47 . Retrieved from http://is.gd/jhi8E9 Jiang , M. ( 2014 ). The business and politics of search engines: A comparative study of Baidu and Google's search results of Internet events in China . New Media & Society , 16 ( 2 ), 212 – 233 . Google Scholar Crossref Search ADS WorldCat Labovitz , C. ( 2010 ). The battle of the hyper giants. Forbes . Retrieved from http://is.gd/QDqTsY Lawrence , S. , & Giles , C. ( 1999 ). Accessibility of information on the Web . Nature , 400 , 107 – 109 . Google Scholar Crossref Search ADS PubMed WorldCat Lee , M. ( 2011 ). Google ads and the blindspot debate . Media, Culture & Society , 33 , 433 – 447 . doi:10.1177/0163443710394902. Google Scholar Crossref Search ADS WorldCat Lianos , I. , & Motchenkova , E. ( 2013 ). Market dominance and search quality in the search engine market . Journal of Competition Law & Economics , 9 ( 2 ), 419 – 455 . Google Scholar Crossref Search ADS WorldCat Liao , H . ( 2013 ). How does Chinese localization influence online visibility? A study on Chinese-language results pages (SERPs). Paper presented at the 10th Chinese Internet Research Conference, Oxford, UK. Liu , C. , Day , W., Sun , S., & Wang , G. ( 2002 ). User behavior and the “globalness” of Internet: From a Taiwan users' perspective . Journal of Computer-Mediated Communication , 2 . doi: 10.1111/j.1083-6101.2002.tb00145.x. OpenURL Placeholder Text WorldCat Machill , M. , Beiler , M., & Zenker , M. ( 2008 ). Search-engine research: A European-American overview and systemization of an interdisciplinary and international research field . Media, Culture & Society , 30 ( 5 ), 591 – 608 . Google Scholar Crossref Search ADS WorldCat McLuhan , M . ( 1964 ). Understanding media: The extensions of man . New York : McGraw-Hill . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC McMillan , R. ( 2013 ). Google serves 25 percent of North American Internet traffic. Wired . Retrieved from http://is.gd/kwXHMQ Mowshowitz , A. , & Kawaguchi , A. ( 2002 ). Assessing bias in search engines . Information Processing & Management , 38 , 141 – 156 . Google Scholar Crossref Search ADS WorldCat Pandey , S. , Roy , S., Olston , C., Cho , J., & Chakrabarti , S. ( 2005 ). Shuffling a stacked deck: The case for partially randomized ranking of search engine results. Proceedings of 31st International Conference on Very Large Databases, 781-792. Pariser , E. ( 2011 ). The filter bubble . New York, NY : Penguin Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Pasquale , F. ( 2008 ). Internet nondiscrimination principles: Commercial ethics for carriers and search engines. The University of Chicago Law Forum, 263-298. People's Net . ( 2010 ). People's Net Online Public Opinion Monitor Lab publishes 2010 Chinese online public opinion analysis report. Retrieved from http://is.gd/hoBgAV Rogers , R. ( 2013 ). Digital methods . Cambridge, MA : MIT Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Spink , A. , & Zimmer , M. (Eds.) ( 2008 ). Web search . Berlin, Germany : Springer . Google Scholar Crossref Search ADS Google Scholar Google Preview WorldCat COPAC Stalder , F. , & Mayer , C. ( 2009 ). The second index: Search engines, personalization and surveillance. In K. Becker & F. Stalder (Eds.), Deep search: The politics of search beyond Google (pp. 98 – 115 ). Vienna, Austria : Studien Verlag . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Van Couvering , E. ( 2009 ). The structuration of traffic on the World-Wide Web (Doctoral dissertation). Retrieved from http://is.gd/OLNkvn Vaughan , L. , & Zhang , Y. ( 2007 ). Equal representation by search engines? A comparison of websites across countries and domains . Journal of Computer-Mediated Communication , 12 ( 3 ), 888 – 909 . doi:10.1111/j.1083-6101.2007.00355.x. Google Scholar Crossref Search ADS WorldCat Wright , J. ( November, 2011 ). Defining and measuring search bias: Some preliminary evidence. International Center for Law and Economics Paper Series, George Mason University. © 2014 International Communication Association
Journal of Communication – Oxford University Press
Published: Dec 1, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.