Translate

Search This Blog

Wednesday, November 27, 2024

Overcoming Challenges: How NPR Digitized Their Music Collection with AI

Practical Application of AI: Evaluating Music to Build a Music Library

Presented by Jane Gilvin, NPR's Research Archives and Data Team



Introduction

Jane Gilvin delivered a presentation on how her team at NPR utilized artificial intelligence (AI) to automate the identification of instrumental and vocal music to build a digital music library more efficiently. The session focused on the practical application of AI in music cataloging, the challenges faced, and the solutions implemented.

About Jane Gilvin and the RAD Team

  • Jane Gilvin:
    • Member of NPR's Research Archives and Data (RAD) Team for nearly 13 years.
    • Educational background in music and library science.
    • Alumna of San Jose State University's Information Science program.
    • Experience in radio since she was a teenager.
  • The RAD Team:
    • Formerly known as the NPR Library, established in the 1970s.
    • Responsible for collecting NPR programming archives.
    • Provides resources for production, including a comprehensive music collection.

NPR's Music Collection Evolution

The NPR music collection has evolved alongside technological advancements:

  • Vinyl Records: The initial collection comprised vinyl records across various genres.
  • Transition to CDs: Shifted to compact discs (CDs) as CD players became standard in production.
  • Digital Music Files: Moved towards digital files to meet the expectations of quick and remote access to music.

Challenges in Digitizing the Collection

The transition to digital presented several challenges:

  • Converting thousands of physical CDs into digital files for immediate access.
  • Ensuring metadata accuracy and consistency, especially for instrumental and vocal classification.
  • Lack of resources for continuous large-scale ingestion and cataloging of new music.

Solution: Automation with AI

The Robot and ORRIS

  • The Robot: A batch processing system capable of ripping CDs, identifying metadata from online databases, and delivering MP3 and WAV files with embedded ID3 tags.
  • ORRIS (Open Resource and Research Information System): A new database developed to allow users to search, stream, and download songs for production.

Implementing Essentia

  • Essentia: An open-source library and collection of tools used to analyze audio and music to produce descriptions and synthesis.
  • Capabilities: Predicts genre, beats per minute, mood, and most importantly, classifies tracks as instrumental or vocal.
  • Training the Algorithm: Used NPR's extensive archive of over 300,000 tracks with existing instrumental and vocal tags to train the algorithm.

Accuracy and Testing

  • Human Cataloging Accuracy: Ranged from 90% to 98%, averaging around 90% due to human error and limitations.
  • Algorithm Accuracy Goal: Set at 80% to balance the usefulness and the efficiency of the process.
  • Results: The algorithm achieved an accuracy of 86%, meeting the team's criteria.

Integration and Quality Control

Building into the Ingest Process

  • Automated the instrumental/vocal tagging during the ingest process of new tracks.
  • Applied the algorithm to existing tracks that lacked instrumental/vocal classification.

User Feedback Mechanism

  • Added a feature allowing users to report incorrectly tagged songs directly from the ORRIS interface.
  • Provided a quick way for the RAD team to receive notifications and correct metadata errors.

Quality Control Measures

  • Automated spreadsheets generated during the algorithm's run allowed for immediate review of results.
  • Periodic checks to ensure the algorithm continues to perform within the acceptable accuracy range.
  • Addressed any shifts in algorithm performance due to changes in the type of music being ingested or other factors.

Demonstration

Jane provided a live demonstration of how the process works:

  1. Showed the ORRIS search interface and how users can search for and listen to tracks (e.g., Thelonious Monk, David Bowie).
  2. Demonstrated the ingestion of new albums and how the algorithm processes them to classify tracks as instrumental or vocal.
  3. Illustrated the use of the user feedback feature to report incorrect classifications.

Benefits and Outcomes

  • Significantly reduced the time and resources required for music cataloging.
  • Enabled continuous addition of new music to the library despite limited staff time.
  • Improved user satisfaction by providing a reliable point of data for instrumental and vocal tracks.

Challenges and Considerations

  • Training Data Limitations: Ensuring the training data was representative and free from bias or errors.
  • Algorithm Bias: Addressing the overrepresentation of certain genres (e.g., jazz and classical) in the training data to avoid skewed results.
  • Metadata Accuracy: Dealing with inconsistent or incorrect metadata from external sources.

Future Plans

Jane discussed potential future projects:

  • Revisiting other algorithms from Essentia, such as those predicting timbre and mood.
  • Implementing user testing and UX projects to improve data research and user experience.
  • Continuing to refine the algorithm and processes to maintain or improve accuracy.

Questions and Answers

During the Q&A session, several topics were addressed:

Copyright and Licensing Considerations

  • NPR has licenses with major performing rights organizations for the use of music in production.
  • Other libraries considering building a music collection should review legal permissions and terms of use.

Data Labeling and Ongoing QA/QC

  • The team performs periodic quality checks but does not engage extensively in data labeling projects.
  • Emphasis on monitoring algorithm performance and making adjustments as needed.

User Testing and UX Improvements

  • Future plans include conducting user testing to evaluate the effectiveness of additional algorithms (e.g., mood taxonomy).
  • Goal is to enhance the search and discovery experience for users.

Conclusion

Jane concluded by emphasizing how the application of AI allowed the RAD team to develop a less time-consuming ingest and cataloging process. This enabled the continuous growth of the music library, providing valuable resources to production staff while efficiently managing limited staff time.

Contact Information

For further information or inquiries, you can reach out to Jane Gilvin through NPR's Research Archives and Data Team.

Protecting Your Privacy: The Risks of Sharing Sensitive Data with AI Tools

Deliberately Safeguarding Privacy and Confidentiality in the Era of Generative AI

Presented by Reed N. Hedges, Digital Initiatives Librarian at the College of Southern Idaho



Introduction

Reed N. Hedges delivered a presentation focusing on the critical importance of safeguarding privacy and confidentiality when using generative artificial intelligence (AI) tools. The session highlighted the potential risks associated with sharing sensitive data with AI models and provided actionable recommendations for users and professionals in the library and information science fields.

Personal Anecdotes and the Need for Caution

Hedges began by sharing several personal anecdotes illustrating how individuals unknowingly compromise their privacy by inputting sensitive information into AI tools:

  • A user who spends long hours chatting with GPT-4, sharing more personal information with the AI than with their own spouse.
  • An individual who input all their grandchildren's data into an AI to generate gift ideas.
  • A person who provided detailed demographic data of a local social group, including identifiable information, to plan activities and programs.
  • A user who entered their entire family budget into an AI tool for financial management.

These examples underscore the pressing need for users to be more conscientious about the data they share with AI systems.

Main Point: Do Not Input Sensitive Data into AI Tools

The core message of the presentation is clear: Users should not input any sensitive or personal data into prompts for generative AI tools. This includes business information, personal identifiers, or any data that could compromise individual or organizational privacy.

Privacy Policies and Data Handling by AI Tools

Hedges highlighted specific concerns regarding popular AI tools:

  • Google Bard: Explicitly notes that human supervisors may read user data, emphasizing the importance of anonymization.
  • OpenAI's ChatGPT: Terms of use discuss the need for proprietary data protection. Users can have a more privacy-conscious session by using OpenAI's Playground or adjusting settings at privacy.openai.com/policies.
  • Perplexity AI: Evades questions about data handling and extrapolation.

The Challenge of Legal Recourse and Privacy Harms

The presentation delved into the limitations of current privacy laws:

  • Harm Requirement: Courts often require proof of harm, which is challenging when privacy violations involve intangible injuries like anxiety or frustration.
  • Impediments to Enforcement: The need to establish harm impedes the effective enforcement of privacy violations, allowing wrongdoers to escape accountability.
  • Lack of Adequate Legal Framework: The existing legal system lacks effective mechanisms to address privacy harms resulting from AI data handling.

Extrapolation and Inference by AI Tools

Generative AI models can infer additional information beyond what users explicitly provide:

  • Data Extrapolation: AI tools can infer behaviors, engagement patterns, and personal attributes from minimal data inputs.
  • Privacy Risks: Such extrapolation can inadvertently reveal sensitive information, including learning disabilities or mental health issues.
  • Example: Even generic prompts can lead to AI inferring personal details that compromise privacy.

Recommendations for Safeguarding Privacy

1. Transparency in Data Collection

  • Inform users about the data being collected and its intended use.
  • Only OpenAI's ChatGPT and Anthropic's Claude explicitly deny storing and extrapolating user data.

2. Informed Consent

  • Obtain explicit consent before collecting or using personal information.
  • Ensure users are aware of the implications of data sharing with AI tools.

3. Data Minimization

  • Limit data collection to what is absolutely essential for the task.
  • Avoid including unnecessary personal or demographic details in AI prompts.

4. Anonymization and Avoiding Sensitive Information

  • Do not include individual attributes or identifiers in AI prompts.
  • Use synthetic or generalized data where possible.
  • Be cautious even with public data, as ethical considerations remain.

5. Implement Strict Access and Use Controls

  • Enforce a "least privilege" access model, using tools that require minimal data access.
  • Ensure staff and users are clear on what data can be input into AI tools.

6. Use Human Content Moderation

  • Have prompts reviewed by multiple individuals to screen for privacy issues.
  • This process can also enhance quality control.

7. Be Skeptical of "Secure" AI Tools

  • Avoid promising or assuming that any AI tool is completely secure.
  • Recognize that even custom AI models can be vulnerable to exploitation.

Understanding AI Terms of Service

Users should familiarize themselves with the terms of service of AI tools:

  • Ownership of Content: OpenAI states that users own the input and, to the extent permitted by law, the output generated.
  • Responsibility for Data: Users are responsible for ensuring that their content does not violate any laws or terms.
  • Data Use: AI providers may use input data for training and improving models unless users opt out.

Final Thoughts on Privacy Practices

Hedges emphasized that traditional privacy protection principles remain relevant but must be applied more diligently in the context of AI:

  • Extra Vigilance: Users must be proactive in safeguarding their data when interacting with AI tools.
  • Data Breaches are Inevitable: Even with safeguards, data breaches can occur; therefore, minimizing shared data is crucial.
  • Reassessing the Need for AI: Consider whether using AI is necessary for a given task, especially when handling sensitive information.

Conclusion

In the era of generative AI, safeguarding privacy and confidentiality requires deliberate and informed actions by users and professionals. By understanding the risks, adhering to best practices, and educating others, individuals can mitigate potential harms associated with AI data handling.

References and Further Reading

  • Danielle Keats Citron and Daniel J. Solove: "Privacy Harms" - A comprehensive paper discussing the challenges in addressing privacy violations legally.
  • Shantanu Sharma: "Artificial Intelligence and Privacy" - An exploration of AI's impact on privacy, available on SSRN.
  • Nathan Hunter: "The Art of ChatGPT Prompting: A Guide to Crafting Clear and Effective Prompts" - A book providing insights into effective AI interactions.

Links to these resources were provided during the presentation for attendees interested in deepening their understanding of AI privacy concerns.

ridging the Gap: The Role of Librarians in Facilitating AI Integration in Library Instruction

Faculty Attitudes Toward Librarians Introducing AI in Library Instruction Sessions

Presented by Beth Evans, Associate Professor at Brooklyn College, City University of New York



Introduction

Beth Evans delivered a presentation discussing the role of librarians in introducing artificial intelligence (AI) tools in library instruction sessions. With over 30 years of experience at Brooklyn College's library, she explored faculty perspectives on the use of AI in academic settings and the potential implications for library instruction.

Background

Evans noted that AI technologies like ChatGPT have the potential to augment, support, or even replace certain library functions, such as reference services, instruction, and technical services. Recognizing the transformative impact of AI, she sought to understand faculty attitudes toward AI and whether they would welcome librarians incorporating AI tools into their instruction sessions.

Research Methodology

In the fall of 2023, Evans conducted a survey targeting faculty members at Brooklyn College. Key aspects of the survey included:

  • Distributed to 199 faculty members.
  • Received 74 responses, representing a response rate of approximately 37%.
  • Respondents came from various departments, with the largest representation from English, History, and Sociology.
  • Questions focused on faculty's introduction of AI in their courses, their attitudes toward AI, and their openness to librarians discussing AI in instruction sessions.

Survey Findings

Faculty Introduction of AI in Courses

Evans explored how faculty members addressed AI in their teaching:

  • Proactive Introduction: Some faculty included AI tools in their syllabi, assignments, or class discussions.
  • Student-Initiated Discussions: In a few cases, students brought up AI topics during classes.
  • No Introduction: A portion of faculty did not introduce AI topics at all.

Methods of Introducing AI

Among faculty who addressed AI:

  • Rule Setting in Syllabi: Establishing guidelines on AI usage in course policies.
  • Class Discussions: Engaging students in conversations about AI's role and impact.
  • Assignments Involving AI: Incorporating AI tools as part of coursework to critically evaluate their utility.

Faculty Attitudes Toward AI

Faculty responses reflected a spectrum of attitudes:

1. Prohibitive

Some faculty strictly prohibited the use of AI tools, expressing concerns about academic integrity and potential threats to human creativity and critical thinking.

2. Cautionary

Others cautioned students about relying on AI, highlighting limitations and encouraging transparency if AI tools were used.

3. Preventative

Certain faculty designed assignments that were difficult or impossible to complete using AI tools, thereby discouraging their use.

4. Proactive Utilization

A group of faculty embraced AI, integrating it into their teaching to enhance learning outcomes:

  • Using AI for media literacy discussions.
  • Employing AI to improve cover letters in business courses.
  • Assigning comparative analyses between AI-generated content and traditional research tools like PubMed.

Faculty Concerns About Librarians Introducing AI

When asked whether they were concerned about librarians introducing AI in library instruction sessions:

  • Majority Not Concerned: Most faculty members were open to librarians discussing AI tools.
  • Supportive of Librarian Expertise: Many acknowledged librarians as information experts capable of providing balanced and ethical guidance on AI.
  • Strong Opposition: A minority expressed strong opposition, fearing AI as a threat to human flourishing and academic integrity.

Additional Faculty Comments

Faculty provided further insights:

Ambivalence and Hesitation
  • Some were uncertain about AI's role and expressed a need for more understanding before fully integrating it.
  • Concerns about keeping pace with rapidly evolving technology and its implications for cheating and academic dishonesty.
Recognizing the Inevitable Presence of AI
  • Acknowledgment that AI is prevalent and students need to be educated about its use.
  • Emphasis on not burying heads in the sand and preparing students for real-world applications where AI is utilized.
Desire for Collaboration with Librarians
  • Faculty expressed interest in workshops and collaborations led by librarians to explore AI tools constructively.
  • Appreciation for librarians' efforts to assist both students and faculty in understanding AI's prevalence and uses.

Conclusion

Beth Evans concluded that while faculty attitudes toward AI vary widely, there is significant openness and even enthusiasm for librarians to take an active role in introducing and educating about AI tools in library instruction sessions. Librarians are viewed as information experts well-equipped to navigate the ethical, practical, and pedagogical aspects of AI in academic settings.

Implications for Librarians

Based on the survey findings:

  • Librarians have an opportunity to lead in AI literacy education, providing balanced perspectives on AI tools.
  • Collaboration with faculty is essential to ensure that AI integration aligns with course objectives and academic integrity policies.
  • There is a need to address concerns and misconceptions about AI, tailoring approaches to different disciplines and faculty attitudes.

Contact Information

For further information or collaboration opportunities, you can contact Beth Evans:

Note: The final slide of the presentation included an AI-generated image using the tool "Tome" with the theme "Ocean."