Latest Blogposts
Stories and updates you can see
Image | Date | Details* |
---|---|---|
October 26, 2023 |
Deep Dive into Yahoo's Semantic Search Suggestions: From Challenges to Effective ImplementationThe Pervasive Problem of Semantic Search
In the expansive digital age where information is not only vast but grows at an exponential rate, the quest for accurate and relevant search results has never been more critical. Within this context, Yahoo Mail, serving millions of users, understood the transformative potential of semantic search. By leveraging the prowess of OpenAI embeddings, we embarked on a journey to provide search results that would understand and match user intent, going beyond the conventional keyword-based approach. And while the results were commendable, they weren't devoid of hurdles:
1. Performance Bottlenecks: The integration of OpenAI embeddings, though powerful, significantly slowed down our search process.
2. User Experience: The new system demanded users to type extensively, often more than they were used to, leading to potential user dissatisfaction.
3. Habit Change: Introducing a paradigm shift in search behaviors meant we were not just altering algorithms but challenging years of user habits.
Our objective was crystal clear yet daunting: We wanted to augment the semantic search with suggestions that were rapid, economically viable, and seamlessly integrated into the user's natural search behavior.
Approach: Exploration Phase
Enticed by the idea of real-time suggestions via large language models (LLMs), we soon realized the impracticality of such an approach, primarily due to the speed constraints. The challenge demanded a solution that operated offline but mirrored the capabilities of real-time systems.
Our exploration led us to task the LLM to frame and answer all conceivable questions for every email a user received. While theoretically sound, the financial implications were prohibitive. Moreover, the risk of the LLM generating "hallucinations" or inaccurate results couldn't be ignored.
It was amidst this exploration that a revelatory idea emerged. We were already equipped with a sophisticated extraction pipeline capable of gleaning crucial information from emails. This was achieved using a blend of human curated regex parsing and meticulously fine-tuned AI models. This became the key to powering our search suggestions.
Implementation Challenges: Transitioning from Conceptualization to Real-World Application
1. The Intricacies of Indexing: One of the more pronounced challenges we encountered revolved around the intricacies of over-indexing. Let's delve into a hypothetical yet common scenario to elucidate this. Imagine a user intending to search for the term "staples." As they begin their search with the initial letters "sta", an all-encompassing approach to indexing, which takes into account every conceivable keyword, might mistakenly steer the user towards unrelated terms like "statement." Such deviations, although seemingly minor, can significantly hamper the user experience. Recognizing the paramount importance of ensuring that our search suggestions remained razor-sharp in their precision and highly relevant, we embarked on a methodical approach. Our resolution was to meticulously handpick and index only a curated set of keywords, ensuring that every suggestion offered was in perfect alignment with the user's intent.
2. The Quest for Relevance in Suggestions: Another challenge that frequently emerged was ensuring the highest degree of relevance in our search suggestions. This challenge becomes particularly pronounced when one considers a situation where a user's inbox is populated with multiple items that bear a resemblance to each other, say multiple flight confirmations. The conundrum we faced was discerning which of these similar items was of immediate interest to the user. Our breakthrough came in the form of an innovative approach centered on the extraction card date. Rather than basing our suggestions on the date the email was received, we shifted our focus to the date of the event described within the email, like a flight's departure date. This nuanced change enabled us to consistently zero in on and prioritize the most timely and pertinent result for the user.
3. Embracing Dynamism and Adaptability: When we first conceptualized our approach, our methodology was anchored in generating questions and answers during the email delivery phase, which were then indexed. However, as we delved deeper, it became evident that this approach, while robust, was somewhat inflexible and lacked the dynamism that modern search paradigms demand. Determined to infuse our system with greater adaptability, we pioneered the Just-in-Time question generation mechanism. With this refined approach, while the foundational search indexes are crafted at the point of delivery, the actual questions are dynamically constructed in real-time, tailored to the user's specific queries and the prevailing temporal context. This rejuvenated approach not only elevated the flexibility quotient of our system but also enhanced operational efficiency, ensuring that users always received the most pertinent suggestions.ImplementationAt delivery time
- Here we extract important information, create cards from the emails and save it in our BE store.
Semantic Search Indexing
- Fetch/Update the extracted cards from BE DB. Index by extracting the keywords and storing in Semantic Search Index - DB
Retrieval
- When the user makes the search, we make a server call which inturn will find the best matching extraction card for the query.
- This will then be used for generating the suggestions for the semantic search.
Conclusion
Our innovative foray into enhancing search suggestions bore fruit in a remarkably short span of 30 days, even as we navigated the intricacies of a completely new tech stack. The benefits were manifold, an enriched user experience and 10% of semantic search traffic handled by search suggestions.
In the rapidly evolving realm of AI, challenges are omnipresent. However, our journey at Yahoo underscores the potential of lateral thinking and a commitment to User Experience. Through our experiences, we hope to galvanize the broader tech community, encouraging them to ideate and implement solutions that are not just effective, but also economically prudent.
Contributors Kevin Patel(patelkev@yahooinc.com) + Renganathan Dhanogopal(renga@yahooinc.com) - Architecture + Tech Implementation Josh Jacobson + Sam Bouguerra(sbouguerra@yahooinc.com) - Product
Author Kevin Patel(patelkev@yahooinc.com) - Director of Engineering Yahoo
Deep Dive into Yahoo's Semantic Search Suggestions: From Challenges to Effective ImplementationOctober 26, 2023
|
|
March 28, 2023 |
Latest updates - March 2023Happy Spring! The Screwdriver team is pleased to announce our newest release which brings in new features and bug fixes across various components.New Features
UI
- UI codebase has been upgraded to use Ember.js 4.4
- Build detail page to display the Template in use
- Links in the event label are now clickable
- PR title shows on PR build page
- Job list to display a build’s start & end times on hover
Bug Fixes
UI
- Job list view to handle job display name as expected
- Artifacts with & in name are now loaded properly
API
- Fixed data loss when adding Templates from multiple browser tabs
- Add API endpoints to add or remove one or more pipelines in a collectionInternals
- Fix for Launcher putting invalid characters on log linesCompatibility List
In order to have these improvements, you will need these minimum versions:
- API - v6.0.9
- UI - v1.0.790
- Store - v5.0.2
- Queue-Service - v3.0.2
- Launcher - v6.0.180
- Build Cluster Worker - v3.0.3Contributors
Thanks to the following contributors for making these features possible:
- Alan
- Anusha
- Haruka
- Ibuki
- Keisuke
- Pritam
- Sagar
- Yuki
- YutaQuestions and Suggestions
We’d love to hear from you. If you have any questions, please feel free to reach out here. You can also visit us on Github and Slack.
Author
Jithin Emmanuel, Director Of Engineering, Yahoo
Latest updates - March 2023March 28, 2023
|
|
December 30, 2022 |
Latest updates - December 2022Happy Holidays! Screwdriver team is pleased to announce our newest release which brings in new features and bug fixes across various components.New Features
UI
- Enable deleting disconnected Child Pipelines from UI. This will give users more awareness and control over SCM URLs that are removed from child pipelines list.
API
- Cluster admins can configure different bookends for individual build clusters.
- Add more audit logs for Cluster admins to track API usage.Bug Fixes
UI
- Collections sorting enhancements.
- Create Pipeline flow now displays all Templates properly.
API
- Pipeline badges have been refactored to reduce resource usage..
- Prevent artifact upload errors due to incorrect retry logic.
Queue Service
- Prevent archived jobs from running periodic jobs if cleanup fails at any point.Internals
- Update golang version to 1.19 across all golang projects.
- Node.js has been upgraded to v18 for Store, Queue Service & Build Cluster Worker.
- Feature flag added to Queue Service to control Redis Table usage to track periodic builds.Compatibility List
In order to have these improvements, you will need these minimum versions:
- API - v5.0.12
- UI - v1.0.759
- Store - v5.0.2
- Queue-Service - v3.0.0
- Launcher - v6.0.178
- Build Cluster Worker - v3.0.2Contributors
Thanks to the following contributors for making this feature possible:
- Alan
- Anusha
- Kevin
- Haruka
- Ibuki
- Masataka
- Pritam
- Sagar
- Tiffany
- Yoshiyuki
- Yuki
- YutaQuestions and Suggestions
We’d love to hear from you. If you have any questions, please feel free to reach out here. You can also visit us on Github and Slack.
Author
Jithin Emmanuel, Director Of Engineering, Yahoo
Latest updates - December 2022December 30, 2022
|
|
October 31, 2022 |
New bug fixes and features - October 2022Latest Updates - October 2022
Happy Halloween! Screwdriver team is pleased to announce our newest release which brings in new features and bug fixes across various components.
New Features
- Add sorting on branch and status for Collections
- Able to select timestamp format in user preferences
- Click on User profile in upper right corner, select User Settings
- Select dropdown for Timestamp Format, pick preferred format
- Click Save
- Soft delete for child pipelines - still need to ask a Screwdriver admin to remove completely
- Notify Screwdriver pipeline developers if pipeline is missing admin
- Add audit log of operations performed on the Pipeline Options page - Screwdriver admins should see more information in API logs
- API to reset user settings
- Support Redis cluster connection
- Add default event meta in launcher - set event.creator properly
- New gitversion binary with multiple branch support - added homebrew formula and added parameter –merged (to consider only versions on the current branch)
Bug Fixes
- UI
- Show error message when unauthorized users change job state
- Job state should be updated properly for delayed API response
- Gray out the Restart button for jobs that are disabled
- Modify toggle text to work in both directions
- Display full pipeline name in Collections
- Allow reset of Pipeline alias
- Remove default pipeline alias name
- Add tooltip for build history in Collections
- API
- Admins can sync on any pipeline
- Refactor unzipArtifactsEnabled configuration
- Check permissions before running startAll on child pipelines
- ID schema for pipeline get latestBuild
Internals
- Models
- Refactor syncStages to fail early
- Pull Request sync only returns PRs relevant to the pipeline
- Add more logs to stage creation
- Data-schema
- Display JobNameLength in user settings
- Remove old unique constraint for stages table
- SCM GitHub
- Get open pull requests - override the default limit (30) to return up to 100)
- Change wget to curl for downloading sd-repo
- Builds cannot be started if a pipeline has more than 5 invalid admins
- Coverage-sonar
- Use correct job name for PR with job scope
- Queue-Service
- Remove laabr
- Launcher
- Update Github link for grep
- Update build status if SIGTERM is received - build status will be updated to Failure when soft evict. Then buildCluster-queue-worker can send a delete request to clean up the build pod
Compatibility List
In order to have these improvements, you will need these minimum versions:
- API - v4.1.297
- UI - v1.0.732
- Store - v4.2.5
- Queue-Service - v2.0.42
- Launcher - v6.0.171
- Build Cluster Worker - v2.24.3
Contributors
Thanks to the following contributors for making this feature possible:
- Alan
- Anusha
- Kevin
- Haruka
- Ibuki
- Masataka
- Pritam
- Sagar
- Sheridan
- Shota
- Tiffany
- Yoshiyuki
- Yuki
- Yuta
Questions and Suggestions
We’d love to hear from you. If you have any questions, please feel free to reach out here. You can also visit us on Github and Slack.
Author
Tiffany Kyi, Sr Software Dev Engineer, Yahoo
New bug fixes and features - October 2022October 31, 2022
|
More Results
Less Results