Bulk upload with duplicate check: Bulk upload with a duplicate check on secondary keys

Recommend this page to a friend!
  All requests RSS feed  >  Bulk upload with duplicate check  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

Bulk upload with duplicate check

A request is featured when there is no good recommended package on the site when it is posted. Edit

by Nancy Anderson - 5 years ago (2015-01-27)

Bulk upload with a duplicate check on secondary keys

This request is clear and relevant.
This request is not clear or is not relevant.

+1

Uploading new items into a product listing into a custom component on a Joomla website. Lots of bulk loaders are out there, but I need to make sure that we don't import duplicates by a two secondary fields, which are not part of the primary key.

The table has a rather lot of fields and the imports vary in size from 20 rows to 6000 rows usually in CSV format.

While this will be used over and over, it will only be into this one table.

  • 3 Clarification requests
  • 8. by Dave Smith - 5 years ago (2015-04-19) Reply

    Are we over-thinking this? If I understand, there is a primary key, obviously unique, but they want to restrict duplicates based on 2 secondary columns. Why aren't we just creating a unique key on these columns and let the insert fail quietly if they duplicate?

    • 4. by Michael Cummings - 5 years ago (2015-01-29) Reply

      So just to be clear they are being added to a database table or something else?

      If they are going into a DB table than you simply need to use an Upsert (update or insert) and that something many things can do for you including just using something like PhpMyAdmin.

      IF you are trying to build huge tables in HTML than you probably need to have someone re-write things to use a database as doing stuff directly into HTML file is going to not be easy at all and hard to maintain

      • 5. by Nancy Anderson - 5 years ago (2015-02-05) in reply to comment 4 by Michael Cummings Comment

        Hi,they are going into the database. System end users will be doing this, phpmyadmin isn't really an option.

        However, you mention other things to do imports, what are some of the others tools you had in mind.

        thanks

    • 1. by Manuel Lemos - 5 years ago (2015-01-28) Reply

      I have not seen a specific class that can do that exactly but I guess you only need to query your database before adding a new item to see if there are already any other items with the same values in those other fields.

      For many items it may become slow to query the table on fields that do not have their own indexes, but at least that approach would solve your problem.

      • 2. by Nancy Anderson - 5 years ago (2015-01-28) in reply to comment 1 by Manuel Lemos Comment

        Sorta line by line comparison? That could take a long time. This is replacing a custom desktop product so a query taking a long time was't very likely to time out. Thanks for your help.

      • 3. by Manuel Lemos - 5 years ago (2015-01-28) in reply to comment 2 by Nancy Anderson Comment

        I think that if the list of products is not large, you can keep in memory, in arrays, so it does not exceed your environment PHP memory limit.

        If the number of items is so large that the list would not fit in the available memory, it is better to use a database, that could be MySQL or SQLite, even if it is just for temporary use.

        Anyway, I did not see any solution for that specific purpose. Maybe that could be a good idea for a developer to contribute with an innovative class.

      • 6. by Till Wehowski - 5 years ago (2015-02-14) in reply to comment 3 by Manuel Lemos Comment

        What aboout to store the files hashes and just do deeper comparison if two hashes match...?

      • 7. by Ernani Joppert Pontes Martins - 5 years ago (2015-02-23) in reply to comment 1 by Manuel Lemos Comment

        A sha1_file($filename) would generate a 40 bytes hash of the file and it's proven in some real use cases of mine to be working extremely good.

        False positives can be ignored by comparing extensions and chunks of bytes of a given file, but, that's only when the hash matches...

        Cheers,

        Ernani

    Ask clarification

    Recommend package
    : 
    : 
    For more information send a message to info at phpclasses dot org.