Skip to main content
Top

12-04-2024

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Author: Balázs Kovács

Published in: Marketing Letters

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Online reviews serve as a guide for consumer choice. With advancements in large language models (LLMs) and generative AI, the fast and inexpensive creation of human-like text may threaten the feedback function of online reviews if neither readers nor platforms can differentiate between human-written and AI-generated content. In two experiments, we found that humans cannot recognize AI-written reviews. Even with monetary incentives for accuracy, both Type I and Type II errors were common: human reviews were often mistaken for AI-generated reviews, and even more frequently, AI-generated reviews were mistaken for human reviews. This held true across various ratings, emotional tones, review lengths, and participants’ genders, education levels, and AI expertise. Younger participants were somewhat better at distinguishing between human and AI reviews. An additional study revealed that current AI detectors were also fooled by AI-generated reviews. We discuss the implications of our findings on trust erosion, manipulation, regulation, consumer behavior, AI detection, market structure, innovation, and review platforms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Zhang et al. (2016) define fake reviews as “deceptive reviews provided with an intention to mislead consumers in their purchase decision making, often by reviewers with little or no actual experience with the products or services being reviewed. Fake reviews can be either unwarranted positive reviews aiming to promote a product, or unjustified false negative comments on competing products in order to damage their reputations.”
 
2
Wu et al. (2020) highlight an interesting exception: some newly established review platforms intentionally add fake reviews and copy reviews from other platforms to give the impression that their platform is widely used, thereby circumventing the catch-22 of platforms: users do not arrive until reviews are posted, and reviews are not posted until users arrive.
 
3
This refined prompt is based on a simpler one from our pilot study, where we found that GPT-4 produces longer texts without shortening instructions. Participants often identified human-generated reviews by typos, misspellings, or unusual spellings like ALL CAPS, leading us to incorporate these in the GPT prompt.
 
4
Given the full randomization, participants may or may not have seen both a human- and an AI-written review of the same restaurant.
 
5
We targeted 150 participants, but after a participant timeout and replacement by Prolific, the original participant returned, completing the survey, and resulting in 151 participants.
 
6
We used ChatGPT to code the reviews for valence, emotionality, presence of typos, profanity, and informal expressions. Specifically, we instructed GPT-4 to “Here is a restaurant review. [XXX] Code this review for each of the following dimensions: sentiment (from 0 to 100, where 100 is highly positive), emotionality (from 0 to 100, where 100 is highly sentimental), the number of typos or misspellings, the number of profane words or expressions, and the number of informal expressions. Put in a table.” We cross-checked a few of these answers and agreed with GPT-4’s answers so we used these values in these regressions.
 
7
The sample of restaurants in Study 2 is different from the sample of restaurants and reviews in Study 1. In Study 2, we only included restaurants that received at least 10 English-language reviews in 2019.
 
Literature
go back to reference Agnihotri, A., & Bhattacharya, S. (2016). Online review helpfulness: Role of qualitative factors. Psychology & Marketing, 33(11), 1006–1017.CrossRef Agnihotri, A., & Bhattacharya, S. (2016). Online review helpfulness: Role of qualitative factors. Psychology & Marketing, 33(11), 1006–1017.CrossRef
go back to reference Ahmad, W., & Sun, J. (2018). Modeling consumer distrust of online hotel reviews. International Journal of Hospitality Management, 71, 77–90.CrossRef Ahmad, W., & Sun, J. (2018). Modeling consumer distrust of online hotel reviews. International Journal of Hospitality Management, 71, 77–90.CrossRef
go back to reference Ananthakrishnan, U. M., Li, B., & Smith, M. D. (2020). A tangled web: Should online review portals display fraudulent reviews? Information Systems Research, 31(3), 950–971.CrossRef Ananthakrishnan, U. M., Li, B., & Smith, M. D. (2020). A tangled web: Should online review portals display fraudulent reviews? Information Systems Research, 31(3), 950–971.CrossRef
go back to reference Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the pricing power of product features by mining consumer reviews. Management Science, 57(8), 1485–1509.CrossRef Archak, N., Ghose, A., & Ipeirotis, P. G. (2011). Deriving the pricing power of product features by mining consumer reviews. Management Science, 57(8), 1485–1509.CrossRef
go back to reference Cheung, C. M., & Lee, M. K. (2012). What drives consumers to spread electronic word of mouth in online consumer-opinion platforms. Decision Support Systems, 53(1), 218–225.CrossRef Cheung, C. M., & Lee, M. K. (2012). What drives consumers to spread electronic word of mouth in online consumer-opinion platforms. Decision Support Systems, 53(1), 218–225.CrossRef
go back to reference Chevalier, J. A., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of Marketing Research, 43(3), 345–354.CrossRef Chevalier, J. A., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of Marketing Research, 43(3), 345–354.CrossRef
go back to reference Dellarocas, C. (2003). The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Management Science, 49(10), 1407–1424.CrossRef Dellarocas, C. (2003). The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Management Science, 49(10), 1407–1424.CrossRef
go back to reference Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45.CrossRef Dellarocas, C., Zhang, X. M., & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales: The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45.CrossRef
go back to reference Han, J., Pei, J., & Tong, H. (2022). Data mining: Concepts and techniques. Morgan Kaufmann. Han, J., Pei, J., & Tong, H. (2022). Data mining: Concepts and techniques. Morgan Kaufmann.
go back to reference He, S., Hollenbeck, B., & Proserpio, D. (2022). The market for fake reviews. Marketing Science, 41(5), 896–921.CrossRef He, S., Hollenbeck, B., & Proserpio, D. (2022). The market for fake reviews. Marketing Science, 41(5), 896–921.CrossRef
go back to reference Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2019). Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650 Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2019). Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:​1911.​00650
go back to reference Jago, A. S. (2019). Algorithms and authenticity. Academy of Management Discoveries, 5(1), 38–56.CrossRef Jago, A. S. (2019). Algorithms and authenticity. Academy of Management Discoveries, 5(1), 38–56.CrossRef
go back to reference Jakesch, M., Hancock, J. T., & Naaman, M. (2023). Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11), e2208839120.CrossRef Jakesch, M., Hancock, J. T., & Naaman, M. (2023). Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11), e2208839120.CrossRef
go back to reference Köbis, N., & Mossink, L. D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior, 114, 106553.CrossRef Köbis, N., & Mossink, L. D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior, 114, 106553.CrossRef
go back to reference Kovács, B. (2024). Studying travel networks using establishment Covisit networks in online review data. Socius, 10, 23780231241228916.CrossRef Kovács, B. (2024). Studying travel networks using establishment Covisit networks in online review data. Socius, 10, 23780231241228916.CrossRef
go back to reference Kovács, B., & Carroll, G. R. (2023). Distinguishing between cosmopolitans and omnivores in organizational audiences. Academy of Management Discoveries, 9(4), 549–577.CrossRef Kovács, B., & Carroll, G. R. (2023). Distinguishing between cosmopolitans and omnivores in organizational audiences. Academy of Management Discoveries, 9(4), 549–577.CrossRef
go back to reference Kovács, B., Carroll, G. R., & Lehman, D. W. (2014). Authenticity and consumer value ratings: Empirical tests from the restaurant domain. Organization Science, 25(2), 458–478.CrossRef Kovács, B., Carroll, G. R., & Lehman, D. W. (2014). Authenticity and consumer value ratings: Empirical tests from the restaurant domain. Organization Science, 25(2), 458–478.CrossRef
go back to reference Kozinets, R. V. (2002). The field behind the screen: Using netnography for marketing research in online communities. Journal of Marketing Research, 39(1), 61–72.CrossRef Kozinets, R. V. (2002). The field behind the screen: Using netnography for marketing research in online communities. Journal of Marketing Research, 39(1), 61–72.CrossRef
go back to reference Laudon, K. C., & Laudon, J. P. (2004). Management information systems: Managing the digital firm. Pearson Education. Laudon, K. C., & Laudon, J. P. (2004). Management information systems: Managing the digital firm. Pearson Education.
go back to reference Le Mens, G., Kovács, B., Hannan, M. T., & Pros, G. (2023). Uncovering the semantics of concepts using GPT-4. Proceedings of the National Academy of Sciences, 120(49), e2309350120.CrossRef Le Mens, G., Kovács, B., Hannan, M. T., & Pros, G. (2023). Uncovering the semantics of concepts using GPT-4. Proceedings of the National Academy of Sciences, 120(49), e2309350120.CrossRef
go back to reference Li, X., & Hitt, L. M. (2008). Self-selection and information role of online product reviews. Information Systems Research, 19(4), 456–474.CrossRef Li, X., & Hitt, L. M. (2008). Self-selection and information role of online product reviews. Information Systems Research, 19(4), 456–474.CrossRef
go back to reference Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12), 3412–3427.CrossRef Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12), 3412–3427.CrossRef
go back to reference Mayzlin, D., Dover, Y., & Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review, 104(8), 2421–2455.CrossRef Mayzlin, D., Dover, Y., & Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review, 104(8), 2421–2455.CrossRef
go back to reference Miller, E. J., Steward, B. A., Witkower, Z., Sutherland, C. A., Krumhuber, E. G., & Dawel, A. (2023). AI hyperrealism: Why AI faces are perceived as more real than human ones. Psychological Science, 34(12), 1390–1403.CrossRef Miller, E. J., Steward, B. A., Witkower, Z., Sutherland, C. A., Krumhuber, E. G., & Dawel, A. (2023). AI hyperrealism: Why AI faces are perceived as more real than human ones. Psychological Science, 34(12), 1390–1403.CrossRef
go back to reference Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of customer reviews on Amazon.com. MIS Quarterly, 34(1), 185–200.CrossRef Mudambi, S. M., & Schuff, D. (2010). What makes a helpful review? A study of customer reviews on Amazon.com. MIS Quarterly, 34(1), 185–200.CrossRef
go back to reference Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Market-structure surveillance through text mining. Marketing Science, 31(3), 521–543.CrossRef Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Market-structure surveillance through text mining. Marketing Science, 31(3), 521–543.CrossRef
go back to reference Orenstrakh, M. S., Karnalim, O., Suarez, C. A., & Liut, M. (2023). Detecting llm-generated text in computing education: A comparative study for chatgpt cases. arXiv preprint arXiv:2307.07411 Orenstrakh, M. S., Karnalim, O., Suarez, C. A., & Liut, M. (2023). Detecting llm-generated text in computing education: A comparative study for chatgpt cases. arXiv preprint arXiv:​2307.​07411
go back to reference Pavlou, P. A., & Dimoka, A. (2006). The nature and role of feedback text comments in online marketplaces: Implications for trust building, price premiums, and seller differentiation. Information Systems Research, 17(4), 392–414.CrossRef Pavlou, P. A., & Dimoka, A. (2006). The nature and role of feedback text comments in online marketplaces: Implications for trust building, price premiums, and seller differentiation. Information Systems Research, 17(4), 392–414.CrossRef
go back to reference Pavlou, P. A., & Gefen, D. (2004). Building effective online marketplaces with institution-based trust. Information Systems Research, 15(1), 37–59.CrossRef Pavlou, P. A., & Gefen, D. (2004). Building effective online marketplaces with institution-based trust. Information Systems Research, 15(1), 37–59.CrossRef
go back to reference Pentina, I., Bailey, A. A., & Zhang, L. (2018). Exploring effects of source similarity, message valence, and receiver regulatory focus on yelp review persuasiveness and purchase intentions. Journal of Marketing Communications, 24(2), 125–145.CrossRef Pentina, I., Bailey, A. A., & Zhang, L. (2018). Exploring effects of source similarity, message valence, and receiver regulatory focus on yelp review persuasiveness and purchase intentions. Journal of Marketing Communications, 24(2), 125–145.CrossRef
go back to reference Sharkey, A., Kovács, B., & Hsu, G. (2023). Expert critics, rankings, and review aggregators: The changing nature of intermediation and the rise of markets with multiple intermediaries. Academy of Management Annals, 17(1), 1–36.CrossRef Sharkey, A., Kovács, B., & Hsu, G. (2023). Expert critics, rankings, and review aggregators: The changing nature of intermediation and the rise of markets with multiple intermediaries. Academy of Management Annals, 17(1), 1–36.CrossRef
go back to reference Tadelis, S. (2016). Reputation and feedback systems in online platform markets. Annual Review of Economics, 8, 321–340.CrossRef Tadelis, S. (2016). Reputation and feedback systems in online platform markets. Annual Review of Economics, 8, 321–340.CrossRef
go back to reference Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236), 433–460. Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236), 433–460.
go back to reference Uchendu, A., Ma, Z., Le, T., Zhang, R., & Lee, D. (2021). Turingbench: A benchmark environment for Turing test in the age of neural text generation. arXiv preprint arXiv:2109.13296 Uchendu, A., Ma, Z., Le, T., Zhang, R., & Lee, D. (2021). Turingbench: A benchmark environment for Turing test in the age of neural text generation. arXiv preprint arXiv:​2109.​13296
go back to reference Wu, Y., Ngai, E. W., Wu, P., & Wu, C. (2020). Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems, 132, 113280.CrossRef Wu, Y., Ngai, E. W., Wu, P., & Wu, C. (2020). Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Systems, 132, 113280.CrossRef
go back to reference Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.CrossRef Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.CrossRef
go back to reference Zhang, T., Li, G., Cheng, T., & Lai, K. K. (2017). Welfare economics of review information: Implications for the online selling platform owner. International Journal of Production Economics, 184, 69–79.CrossRef Zhang, T., Li, G., Cheng, T., & Lai, K. K. (2017). Welfare economics of review information: Implications for the online selling platform owner. International Journal of Production Economics, 184, 69–79.CrossRef
go back to reference Zhao, Y., Yang, S., Narayan, V., & Zhao, Y. (2013). Modeling consumer learning from online product reviews. Marketing Science, 32(1), 153–169.CrossRef Zhao, Y., Yang, S., Narayan, V., & Zhao, Y. (2013). Modeling consumer learning from online product reviews. Marketing Science, 32(1), 153–169.CrossRef
Metadata
Title
The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?
Author
Balázs Kovács
Publication date
12-04-2024
Publisher
Springer US
Published in
Marketing Letters
Print ISSN: 0923-0645
Electronic ISSN: 1573-059X
DOI
https://doi.org/10.1007/s11002-024-09729-3