Tips for Using AI in Academic Writing

This post contains an example of how to declare the use of AI in academic work (see the ‘Methodology’ section below) as well as explaining triangulation as a way of verifying AI-generated information.

Methodology

OpenAI’s ChatGPT (September 25 Version) was used to define triangulation and present examples of it. The prompt given to ChatGPT was as follows:

“define triangulation in one sentence. then, give three examples of triangulation in the context of verifying AI-generated information.”

The response given by ChatGPT was evaluated by using the author’s expert judgment to assess whether the information it gave was correct. The response was modified by removing redundant examples that did not fit with the context of this work and by adding contextually suitable examples of credible data sources for thesis work. The content generated by AI (and modified by the author of this document) can be found in section ‘Triangulation’. The original chat can be found online (https://chat.openai.com/share/2c36a655-6656-4ba6-aff4-7cfe4f521f02).

So, you can see some principles of declaration (highlighted in yellow):

  1. The specific tool and its version is made public: “ChatGPT (September 25 version)”
  2. The prompt is made public: “The prompt given was…”
  3. The original chat is made available so that people can compare AI-generated text with the final text: “can be found online ([link])”
  4. The process of verifying the AI-generated content is explained: “was evaluated by…”
  5. The process of modifying the AI-generated content is explained: “was modified by…”
  6. The location of AI-generated content in the document is specified: “can be found in section…”

Triangulation

Triangulation refers to the use of multiple methods, data sources, or perspectives to validate or cross-check information.

In thesis work, students can apply source cross-referencing as a form of triangulation. This involves comparing AI-generated information with multiple, trusted data sources or databases (e.g., government-issued statistics, and most importantly peer-reviewed academic research articles) to ensure consistency and accuracy.

The scope of using AI

AI can be used in many ways to support the academic writing process. In some parts of the manuscript, it is more appropriate to use AI:

  • abstract — OK! There’s no harm in letting the AI help you summarize your work.
  • definitions — OK! There’s no harm in letting the AI define concepts.
  • literature review — LIMITED OK! AI can summarize previous research BUT it can make mistakes and focus on irrelevant aspects. DO NOT use AI-generated literature reviews ‘as is’ — I’ve applied here multiple times and even though adjusting the prompting helps, the outputs always require manual heavy editing.
  • methodology — LIMITED OK! I’ve used AI to come up with research designs, research questions, and hypotheses. It gives good results! Meaning, the ideas tend to make sense. Of course, you as the subject matter expert need to manually assess the quality of these ideas. For a seasoned researcher it’s easier as they’d already know the field well; for a novice research, it’s much harder. So, you need to establish your own baseline understanding of a field to be able to make sense if research questions, hypotheses, or methods make sense in your domain.
  • results — LIMITED OK! Here, the same logic applies as previously. You can use AI to help you with your statistical analysis, for example, but then you need to know the basics of statistical analysis to ensure that the AI did the job correctly. If you don’t know the basics of statistical analysis, you should first learn them before using AI, because if the AI makes a mistake, you wouldn’t be able to detect it. The risk of invalid results therefore increases if you don’t know the basics. Of course, AI can help you learn the basics! You can ask if questions about different methods in an iterative manner and then use Google to cross-check from reputable sources if the answers are consistent. So, you can learn many things by doing things with the AI – this is the beauty of AI-human collaboration.
  • discussion — LIMITED OK! The implications should be based on your own thinking. I’ve seen many discussion sections that list AI-generated outputs as “original” ideas — those often are neither based on the study’s findings nor really that original. The only way I’d use AI here is for coarse inspiration — absolutely NOT as a replacement of your own thinking: what do your results imply for theory and practice? It is impossible to do good research without you actually taking a breath and thinking about this question.
  • conclusions — LIMITED OK! As in the previous case, the conclusion of your work should be based on your own thinking. The AI can help you address the blank page problem but it should not replace your thinking.

NOTE: In all of the above cases, the principles of verification and declaration still apply. So, you need to (a) tell how you used the AI and (b) manually make sure what it wrote is correct. There’s no shortcut here.

How to Handle AI in Education?

In this post, I’m discussing the use of AI — specifically LLMs or large language models, also known as generative AI, of which Open AI’s GPT models (including ChatGPT) are prominent examples. I will first present a case for why detection and prevention of such models in education is not feasible or even possible. Then, I’ll propose ways of instructing students to properly use AI in learning. Finally, I’ll discuss the implications for different student types.

[UPDATE: September 29, 2023 — adding three tips for students.]

[UPDATE: June 22, 2023 — adding some pointers at the end of this article about the use of ChatGPT and other LLMs in thesis work.]

1. Three Tips for Students

Three tips for students to cope with the use of AI for classwork:

  1. Use common sense, as in, ask yourself: “Is the way I’m using this tool ethical?” We all have a moral compass and an intuitive sense of wrong and right.
  2. Declare — meaning, write up exactly and precisely how you used the AI tool(s). This activity in itself will help in the process of determining “is this okay?”. If, in the course of writing, you realize things like “okay, this could lead into misinformation” or “I didn’t do anything to validate the information” or “I didn’t manually edit the text afterwards but just copy-pasted”, those are all red flags. So, making the use explicit and transparent helps yourself as much as others to determine if the use was appropriate. See an example of AI use declaration.
  3. Ask! Whenever in doubt, immediately ask your teacher, “I used/am planning to use ChatGPT like this, is it okay?”. Many of the rules and practices are currently forming, so we can create these in collaboration together with students and teachers.

2. Why Detection and Prevention Fails

There might be some indicative cues of whether a text piece was written by student or by GPT, such as the lack of grammar mistakes (GPT writes near-perfect language, students often don’t :), lack of aforisms and metaphors (people often use but rarely seeing GPT use them), etc.

However, the detection and prevention method seems fundamentally flawed for at least the following reasons:

  • there’ll be a non-trivial number of false positives and negatives which means there’ll be need for additional verification which is likely to involve guesswork (i.e., ultimately we don’t know whether a text was written by an AI, a human, or combination thereof).
  • as soon as such cues become public information students will start circumventing them, resulting in a game of cat-and-mouse, and
  • the hybrid use of GPT is particularly difficult to detect as relatively minor edits to a text can significantly change its appearance and thus fool both algorithmic and manual detection.

The result of this is that AI-generated text is often indistinguishable from human-written text and undetectable by algorithms. There are some “cues”, though, but we cannot rely on them to 100% argue that a student used Gen AI or not. We ask ask, and they can deny, and there’s little more. The conclusion is that prohibiting the use of AI is difficult or impossible for this reason alone. We also don’t want to prohibit its use entirely, because its use is quickly becoming a professional skill that we should teach.

3. How to Instruct Students to Use AI

Based on above reasoning, my take is that students should be allowed the use of GPT (as we cannot prevent them from using it and they’re likely to use it in their jobs anyway), but we must teach its ethical use — that is,

  • declare its use instead of pretending the text was 100% created by you,
  • explain how you used it in detail (the prompts, the editing process, etc.), and
  • verify all facts presented by GPT as it has a tendency to hallucinate (verification done using credible sources such as academic research articles, government/institute/industry reports, statistical authorities). Facts refer to numerical information — the concept definitions of GPT tend to be accurate, based on my own experience. See how to triangulate AI-generated information.

Concerning the GPT models, we must bear in mind the instrumentality maxim: technology is just a tool. A bad student will use it in a bad way; a good student will use it in a good way. While we cannot remove the badness from this system, we can engage in some measures to tilt it in favor of the good, such as encouraging ethical behaviors and penalizing unethical ones. The bottom line is that “Zero GPT” just isn’t a realistic policy option, like “Zero Wikipedia” (or “Zero Google”) wasn’t either.

4. Using ChatGPT and Other LLMs in Thesis Work

The academic year 2023-2024 is rapidly approaching, and I find there are still no solutions for the concern of students writing their bachelor’s or master’s theses using ChatGPT.

  • Turnitin has a detector — is not reliable enough to make a determination
  • there are some other detectors developed by researchers and commercial parties — also not reliable enough to make a determination
  • there are also ways to try and manually find out about this new form of plagiarism: e.g., by detecting cues like “too perfect” text (ChatGPT writes grammatically better text than students!) and certain ways of expression that ChatGPT applies (e.g., it uses a lot of lists and very few metaphors). However, these are fallible and at best, the student will just deny using ChatGPT and we cannot argue against without proper evidence.

So, how to tackle this issue? Here are some ways I’m currently thinking:

First, ask students sign a pledge at the beginning of the course:

“This pledge contains two promises: (a) I promise that I declare my use of ChatGPT or similar tools — also in the case I didn’t use them. If I did use them as a part of the writing process, I’ll explain in detail how I used them, how I verified the information given by them (via triangulation based on academic sources: see instructions here: [link]), and how I edited the computer-generated text to make it mine. Also, (b) I promise that I won’t use the computer-generated text as is, but I will verify its content and edit according to the guidelines provided by Joni: [link].”

So, the point is not to ban the use of ChatGPT (because we cannot enforce such a ban) but instead allow its use, given that the student uses it ethically and declares its use. In this process, we educators must provide guidelines. That is, specific instructions on how to use ChatGPT in a responsible way and how to declare its use.

Second, I’m thinking of asking students to “write less, tabulate more”. That is, when they review the articles, I want to see their Excel files that contain structured data based on information from the articles. While ChatGPT can help in this process, it is not (at least yet) able to carry out all the steps in the process, including (a) identification of relevant literature, (b) defining a coding scheme that’s relevant for the research questions, (c) extracting information from the articles according to the scheme. Again, it can help in these steps, but the student needs to input considerable creative effort in managing the whole process. So, they need to engage in proper research activities, which is the point of doing a thesis.

In a way, writing a thesis during the ChatGPT era might become more demanding, because now it’s now longer adequate to just produce “text” — instead, students need to demonstrate their thinking process in other ways as well. So, ChatGPT can actually raise the bar of theses and make them better. This is exactly how a good tool should affect an activity – it should make the completion of an activity easier while maintaining high quality.

And, from fairness perspective, this new requirement can be met with or without using ChatGPT. Just like other activities in professional knowledge work.

5. Implications of AI for Different Student Types

Furthermore, let us take this apart a bit more by considering four student types:

  • A good student = one that wants to learn and do a good job passing their courses/degree.
  • A talented student = one that has good intentions and is good at learning.
  • A poor student = one that has good intentions (is a good student) but struggles to learn from some reason or another.
  • A bad student = one that doesn’t want to learn but just wants to pass a course/degree with minimum effort.

Implications of AI for different types are the interesting part. I am not too much interested in the bad students. They might be viewed as being out of scope – in some cases, the attitude is what it is and cannot be changed. We cannot force learning. In other cases, it might be possible to convert a bad student to being a good student, but I don’t see GPT relevant for this (cf. the instrumentality hypothesis).

For the good students, we need to tell how to use GPT in a good way, so they know how to do so and so that they can use it to learn more efficiently. My hypothesis is that GPT supports the learning of talented students as they can use GPT to amplify their already good learning strategies. However, I am not sure about the implications for poor students, but whatever they may be, those students also need guidance.

In terms of skills that educators would need to pass on to their students, at least two readily come to mind:

  • The ability to ask questions: for GPT to be useful, one needs to ask it the right questions. A “right question” is one that supports learning. Learning a topic requires coming up with *many* questions that become progressively more advanced. So, the student needs to be able to craft progressively more difficult questions in order to increase his or her knowledge (in between, the student obviously needs to read and reflect on the answers). This skill relies equally on formulating the substance of the question (i.e., what is actually being asked?) as well as phrasing the question (i.e., how is the question being asked?). Both factors affect the response quality – for example, student who wants to know about the history of AI could learn that there is a concept called “AI winter” and then ask the AI to explain this concept. But, there are in fact two AI winters, so the sequence and formulation of the questions can yield some gaps in the student’s learning. Thus, “prompting strategies” and “prompt engineering” are skills relevant here.
  • The ability to evaluate the quality of answers: once the student receives an answer, they need to be able to assess the quality of the answer. What does quality mean? At least two criteria: veracity, so that the answer is true or correct, and comprehensiveness, so that the answer contains the necessary information to satisfy the information request. A third criterion could be connection – i.e., the answer introduces related concept for the student to increase their learning by formulating new questions to the AI based on these concepts.

In terms of overall learning (i.e., ensuring that the student masters what he or she is expected to master to get a degree), the optimal mix would be going back to controlled exams (they’ve been diminishing in use over time, this might reverse some of that change) for some courses, while teaching the correct use of GPT in others.

6. Conclusion

AI is coming into education (or, rather, it’s already here). Educators cannot prevent the use of AI models in learning. Instead, they should make sure such models are used ethnically and in a way that supports students’ learning.

Acknowledgments

Thanks to Mikko Piippo for the LinkedIn discussion that inspired this post 🙂 The tips were invented during a lesson with Bachelor’s thesis students at the University of Vaasa.

Problem of averages applies to social media community guidelines

(This post is based on a discussion with a couple of other online hate researchers.)

  1. given a general policy (community guidelines), it is possible to create specific explanations that cover most cases of violating the policy (example of explanation: “your message was flagged as containing islamophobic content”). This is based on the idea that ultimately the policy itself is finite, so even though cases of islamophobia might be many, the policy always either contains or does not contain this form of hate speech. If the general policy itself is lacking, then it needs to be fixed first.
  2. the problem of explaining hate speech moderation could be seen as a classification problem, where each explanation is a class. Here, we observed the ground truth problem, which we referred to as inherent subjectivity of hate speech. In other words, it is not possible to create uncontestable or “immutable” hate speech.
  3. the solution to this inherent subjectivity can take place at two levels: (a) at the user level by finetuning/adjusting the hate speech detection algorithm based on user preferences and not community guidelines, i.e., learning to flag what the user finds offensive rather than defining it a priori. This would make community guidelines obsolete or at least very much less influential (possibly only focusing on some hateful words that could not be used in a non-offensive way, if those exist).
    …or, (b) at the community level, where each community (e.g., page, person) within the platform defines its own rules as to what speech is allowed. By joining that community, a user acknowledges those rules. This essentially shifts the community guideline creation from the platform to subcommunities within the platform.
  4. both a and b above rely on the notion that a platform’s community guidelines essentially suffer from the problem of averages: average is good in general, but perfect for nobody. The only way I can see around that is by incorporating user or community features (=preferences) to essentially allow/disallow certain speech in certain communities. Then, users who do not wish to see certain speech simply unfollow the community. This affords the flexibility of creating all kinds of spaces. Simultaneously, I would give more moderation tools to the communities themselves which, again, I think is a better approach than a global “one shoe fits all” policy.

Sales attribution in cross-channel retargeting

Digital marketing thought of the day (with Tommi Salenius):

There are two channel types for online ads: “first-channel”, meaning the channel that gets a customer’s attention first. And “second-channel”, where we run retargeting ads to activate the non-converted audience from the first-channel.

There are also two types of consumers: “buy now”, meaning those that buy immediately without the need for activation. And “buy later”, meaning those that require longer time and/or activation to buy.

So, depending on how well the first-channel is able to locate the different consumer types, you might see very different performance results. If the first-channel is good at locating the “buy now” consumers, it will show good performance and the second-channel will show bad performance. In contrast, the opposite happens if the first-channel is good at locating “buy later” consumers; in this case, the first-channel will appear weak and the second-channel will appear good.

This typology highlights the complexities of attributing campaign performance in cross-channel environments: a channel might do well in locating potential buyers but another channel might claim the credit for the sale.

“It eliminates all the fun.”​ Automation taking over marketing?

Jon Loomer, a well-respected digital marketer, was interviewed by Andy Gray in Andy’s podcast.

They discussed automation and the impacts it has on the future of digital marketing.

Andy asked what happens when everything becomes standardized by the platform? That is, when the platform sets the click price, chooses the targeting, and even writes the ads.

The logic of the question was that a marketer will no longer have any competitive tools against others within the platform – and it appears we cannot do anything to get ahead of the competition anymore.

Jon’s comment to all this — “it eliminates all the fun” — got my attention.

Eliminating all fun is an important aspect from a marketer’s perspective. One can easily lose the meaning of one’s work in such an environment where all creativity, experimentation, and decision making is taken away, and one is left with the role of supporting the algorithm with occasional oneliners that the machine chooses from.

However, from a macro perspective, a couple of thoughts about the future.

First, we might enter some form of “perfect market” where supply and demand are matched in perfect alignment of the platform’s vision. Then, if the rules and procedures are the same for all, this can be considered fair (as in: procedural fairness) and the “biggest checkbook” doesn’t always win.

One example is the quality score — it can equalize advertisers by setting click price based on quality, not the willingness to bid the highest.

…in quality score’s case, though, the score becomes eradicated as the platform is taking over the quality part of ad creation. But the point remains — there may arise natural differentiation factors. For example, a major brand couldn’t buy “barber shop in [small town x]” keywords because the system would (supposedly) be able to know that the major brand doesn’t have an outlet there, and so the small barber shops that have would be at a structural advantage in this local example of bidding.

The question still remains: how would the winner be determined among the rivaling small barber shops?

In my opinion, there would need to be some secondary information that serves as a signal for matching user intent with the best possible alternative for that intent: product reviews, website usability, pricing…

The kind of information that affects how likely a user is to buy from Barbershop A vs. Barbershop B. Think Google reviews, Core Web Vitals, XML product feed information (or simplified versions of it). Opening hours, etc.

Take an example of user searching for a barber shop at midnight: is there one open? If so, it wins the competition due to natural factors, not due to “optimization”.

The point is: there will always be natural signals (as in: characteristics appearing outside the ad platform and independent of it) that the ad platform can incorporate in its decision as to which ad takes precedence. These signals can take away the “game” or “gaming” of the platform, that we call optimization (at the same time taking away the fun from doing digital marketing), but it’s not certain this would result in a situation where either the companies using advertising or the users would be worse off.

Non-linear growth of value in platform business

The value of aggregation in ad business (and probably in most other verticals, too) is that 1 impression would have zero value, i.e., no advertiser wants to pay. 10 impressions also have zero value, so do 100 impressions. But when we get to hundreds of thousands to millions, all of a sudden the value spikes from zero to a non-trivial amount. …so, aggregators can buy a lot of small players for nickels, and put them together to reach a scale that goes from zero to non-trivial value. (Another way to frame this is that the value does not grow linearily with the number of impressions.) #platforms #business #aggregation #economics

About academic competition and network effects

It’s extremely hard for a small research team to compete against huge labs. It’s a question of network effects; the huge labs have the critical mass of funding, collaborators, momentum, and publication pull to attract the best talent, whether the brightest PhD students or the best post-docs. The rest are left with the “scraps”. Seems cold to say so, but academia is just like any other environment: many have PhDs but only a few select are able to publish in the top venues, year after year. And those people are more and more centralized to the top institutions. For a research institution, the choice is pretty clear: either focus on building a critical mass of talent (i.e., big enough teams with big enough funding) or face institutional decline. #research #strategy #competition #talent

Research on Persona Analytics

This year (2021), we have managed to publish two research papers that I consider important milestones in persona analytics research. They are listed below.

Jung, S., Salminen, J., & Jansen, B. J. (2021). Persona Analytics: Implementing Mouse-tracking for an Interactive Persona System. Extended Abstracts of ACM Human Factors in Computing Systems – CHI EA ’21. https://doi.org/10.1145/3411763.3451773

Jung, S.-G., Salminen, J., & Jansen, B. J. (2021). Implementing Eye-Tracking for Persona Analytics. ETRA ’21 Adjunct: ACM Symposium on Eye Tracking Research and Applications, 1–4. https://doi.org/10.1145/3450341.3458765