There is a **list L**. It contains elements of **arbitrary type each**. How to delete all duplicate elements in such list efficiently? **ORDER must be preserved**

Just an algorithm is required, so no import any external library is allowed.

Assuming order matters:

- Create an empty set S and an empty list M.
- Scan the list L one element at a time.
- If the element is in the set S, skip it.
- Otherwise, add it to M and to S.
- Repeat for all elements in L.
- Return M.

In Python:

```
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> S = set()
>>> M = []
>>> for e in L:
... if e in S:
... continue
... S.add(e)
... M.append(e)
...
>>> M
[2, 1, 4, 3, 5, 6]
```

If order does not matter:

```
M = list(set(L))
```

Firstly, we need to determine something about the assumptions, namely the existence of an equals and has function relationship. What do I mean by this? I mean that for the set of source objects S, given any two objects x1 and x2 that are elements of S there exists a (hash) function F such that:

```
if (x1.equals(x2)) then F(x1) == F(x2)
```

Java has such a relationship. That allows you to check to duplicates as a near O(1) operation and thus reduces the algorithm to a simple O(n) problem. If order is unimportant, it's a simple one liner:

```
List result = new ArrayList(new HashSet(inputList));
```

If order is important:

```
List outputList = new ArrayList();
Set set = new HashSet();
for (Object item : inputList) {
if (!set.contains(item)) {
outputList.add(item);
set.add(item);
}
}
```

You will note that I said "near O(1)". That's because such data structures (as a Java HashMap or HashSet) rely on a method where a portion of the hash code is used to find an element (often called a bucket) in the backing storage. The number of buckets is a power-of-2. That way the index into that list is easy to calculate. hashCode() returns an int. If you have 16 buckets you can find which one to use by ANDing the hashCode with 15, giving you a number from 0 to 15.

When you try and put something in that bucket it may already be occupied. If so then a **linear** comparison of all entries in that bucket will occur. If the collision rate gets too high or you try to put too many elements in the structure will be grown, typically doubled (but always by a power-of-2) and all the items are placed in their new buckets (based on the new mask). Thus resizing such structures is relatively expensive.

Lookup may also be expensive. Consider this class:

```
public class A {
private final int a;
A(int a) { this.a == a; }
public boolean equals(Object ob) {
if (ob.getClass() != getClass()) return false;
A other = (A)ob;
return other.a == a;
}
public int hashCode() { return 7; }
}
```

This code is perfectly legal and it fulfills the equals-hashCode contract.

Assuming your set contains nothing but A instances, your insertion/search now turns into an O(n) operation, turning the entire insertion into O(n^{2}).

Obviously this is an extreme example but it's useful to point out that such mechanisms also rely on a relatively good distribution of hashes within the value space the map or set uses.

Finally, it must be said that **this is a special case**. If you're using a language without this kind of "hashing shortcut" then it's a different story.

If no ordering function exists for the list then you're stuck with an O(n^{2}) brute-force comparison of every object to every other object. So in Java:

```
List result = new ArrayList();
for (Object item : inputList) {
boolean duplicate = false;
for (Object ob : result) {
if (ob.equals(item)) {
duplicate = true;
break;
}
}
if (!duplicate) {
result.add(item);
}
}
```

If an ordering function exists (as it does with, say, a list of integers or strings) then you sort the list (which is O(n log n)) and then compare each element in the list to the next (O(n)) so the total algorithm is O(n log n). In Java:

```
Collections.sort(inputList);
List result = new ArrayList();
Object prev = null;
for (Object item : inputList) {
if (!item.equals(prev)) {
result.add(item);
}
prev = item;
}
```

**Note:** the above examples assume no nulls are in the list.

If the order does not matter, you might want to try this algorithm written in Python:

```
>>> array = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6]
>>> unique = set(array)
>>> list(unique)
[1, 2, 3, 4, 5, 6]
```

In java, it's a one liner.

```
Set set = new LinkedHashSet(list);
```

will give you a collection with duplicate items removed.

For Java could go with this:

```
private static <T> void removeDuplicates(final List<T> list)
{
final LinkedHashSet<T> set;
set = new LinkedHashSet<T>(list);
list.clear();
list.addAll(set);
}
```

in haskell this would be covered by the `nub`

and `nubBy`

functions

```
nub :: Eq a => [a] -> [a]
nub [] = []
nub (x:xs) = x : nub (filter (/= x) xs)
nubBy :: (a -> a -> Bool) -> [a] -> [a]
nubBy f [] = []
nubBy f (x:xs) = x : nub (filter (not.f x) xs)
```

`nubBy`

relaxes the dependence on the `Eq`

typeclass, instead allowing you to define your own equality function to filter duplicates.

These functions work over a list of consistent arbitrary types (e.g. `[1,2,"three"]`

is not allowed in haskell), and they are both order preserving.

In order to make this more efficient, using Data.Map (or implementing a balanced tree) could be used to gather the data into a set (key being the element, and value being the index into the original list in order to be able to get the original ordering back), then gathering the results back into a list and sorting by index. I will try and implement this later.

```
import qualified Data.Map as Map
undup x = go x Map.empty
where
go [] _ = []
go (x:xs) m case Map.lookup x m of
Just _ -> go xs m
Nothing -> go xs (Map.insert x True m)
```

This is a direct translation of @FogleBird's solution. Unfortunately it doesn't work without the import.

a Very basic attempt at replacing Data.Map import would be to implement a tree, something like this

```
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Show, Read)
insert x Empty = Node x Empty Empty
insert x (Node a left right)
| x < a = Node a (insert x left) right
| otherwise = Node a left (insert x right)
lookup x Empty = Nothing --returning maybe type to maintain compatibility with Data.Map
lookup x (Node a left right)
| x == a = Just x
| x < a = lookup x left
| otherwise = lookup x right
```

an improvement would be to make it autobalancing on insert by maintaining a depth attribute (keeps the tree from degrading into a linked list). This nice thing about this over a hash table is that it only requires your type to be in the typeclass Ord, which is easily derivable for most types.

I take requests it seems. In response to @Jonno_FTWs inquiry here is a solution which completely removes duplicates from the result. It's not entirely dissimilar to the original, simply adding an extra case. However the runtime performance will be much slower since you are going through each sub-list twice, once for the elem, and the second time for the recusion. Also note that now it will not work on infinite lists.

```
nub [] = []
nub (x:xs) | elem x xs = nub (filter (/=x) xs)
| otherwise = x : nub xs
```

Interestingly enough you don't need to filter on the second recursive case because elem has already detected that there are no duplicates.

In Python

```
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> a=[]
>>> for i in L:
... if not i in a:
... a.append(i)
...
>>> print a
[2, 1, 4, 3, 5, 6]
>>>
```

**One line solution in Python**.

Using lists-comprehesion:

```
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> M = []
>>> zip(*[(e,M.append(e)) for e in L if not e in M])[0]
(2, 1, 4, 3, 5, 6)
```

- go through the list and assign sequential index to each item
- sort the list basing on some comparison function for elements
- remove duplicates
- sort the list basing on assigned indices

for simplicity indices for items may be stored in something like std::map

looks like O(n*log n) if I haven't missed anything

Maybe you should look into using associate arrays (aka dict in python) to avoid having duplicate elements in the first place.

It depends on what you mean by "efficently". The naive algorithm is O(n^2), and I assume what you actually mean is that you want something of lower order than that.

As Maxim100 says, you can preserve the order by pairing the list with a series of numbers, use any algorithm you like, and then resort the remainder back into their original order. In Haskell it would look like this:

```
superNub :: (Ord a) => [a] -> [a]
superNub xs = map snd
. sortBy (comparing fst)
. map head . groupBy ((==) `on` snd)
. sortBy (comparing snd)
. zip [1..] $ xs
```

Of course you need to import Data.List (sort), Data.Function (on) and Data.Ord (comparing). I could just recite the definitions of those functions, but what would be the point?

That is we can't use `set`

(`dict`

) or `sort`

.

```
from itertools import islice
def del_dups2(lst):
"""O(n**2) algorithm, O(1) in memory"""
pos = 0
for item in lst:
if all(item != e for e in islice(lst, pos)):
# we haven't seen `item` yet
lst[pos] = item
pos += 1
del lst[pos:]
```

Solution is taken from here:

```
def del_dups(seq):
"""O(n) algorithm, O(log(n)) in memory (in theory)."""
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
```

That is we can use `sort`

. This solution doesn't preserve original order.

```
def del_dups3(lst):
"""O(n*log(n)) algorithm, O(1) memory"""
lst.sort()
it = iter(lst)
for prev in it: # get the first element
break
pos = 1 # start from the second element
for item in it:
if item != prev: # we haven't seen `item` yet
lst[pos] = prev = item
pos += 1
del lst[pos:]
```

I've written an algorithm for string. Actually it does not matter what type do you have.

```
static string removeDuplicates(string str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2) {
return str;
}
char[] arr = str.ToCharArray();
int len = arr.Length;
int pos = 1;
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < pos; ++j) {
if (arr[i] == arr[j]) {
break;
}
}
if (j == pos) {
arr[pos] = arr[i];
++pos;
}
}
string finalStr = String.Empty;
foreach (char c in arr.Take(pos)) {
finalStr += c.ToString();
}
return finalStr;
}
```

My code in Java:

```
ArrayList<Integer> list = new ArrayList<Integer>();
list.addAll({1,2,1,3,4,5,2,3,4,3});
for (int i=0; i<list.size(); i++)
{
for (int j=i+1; j<list.size(); j++)
{
if (list.get(i) == list.get(j))
{
list.remove(i);
j--;
}
}
}
```

or simply do this:

```
SetList<Integer> unique = new SetList<Integer>();
unique.addAll(list);
```

*Both ways have Time = nk ~ O(n^2)*

*where n is the size of input list,*

*k is number of unique members of the input list*

Algorithm delete_duplicates (a[1....n])

*//Remove duplicates from the given array*

*//input parameters :a[1:n], an array of n elements*

{

`temp[1:n];`

*//an array of n elements*

```
temp[i]=a[i];for i=1 to n
temp[i].value=a[i]
temp[i].key=i
```

***//based on 'value' sort the array temp.*******

*//based on 'value' delete duplicate elements from temp.*

*//based on 'key' sort the array temp.//construct an array p using temp.*

```
p[i]=temp[i].value
return p
```

**In other of elements is maintained in the output array using the 'key'. Consider the key is of length O(n), the time taken for performing sorting on the key and value is O(nlogn). So the time taken to delete all duplicates from the array is O(nlogn).**

Similar Questions

how can i write a query to list duplicate entries in a database from the same category. The duplicates have the same value in the name column. I need to list only the duplicates in the same category

I've seen the following: Random element of List<T> from LINQ SQL What I'm doing now - to get three random elements - is the following: var user1 = users.ElementAt( rand.Next( users.Count() ) )

When I have a ListActivity and an Adapter how does Android handle a list with 200 elements. Does it try to load all of them directly how does it wait till the user scrolls and then renders those eleme

Possible Duplicate: How to convert list into a string? Hi there, How to convet list to an string. thanks

Possible Duplicate: How to delete files from directory based on creation date in php? How would i delete all the images in a folder that are 24 hours old or older in php?

Possible Duplicate: Can you remove elements from a std::list while iterating through it? I have a loop in a function, that iterates over an std::list from begin to end. In each cycle, I perform some

i need to write a function which calculates how many elements on list with scheme language. for example (howMany 'a) returns 0 (howMany '(a b)) returns 1 (howMany '(a (b c))) returns 2 how can i do th

Requirements/constraint: delete only duplicates keep one copy list is not initially sorted How can this be implemented in C? (An algorithm and/or code would be greatly appreciated!)

I am going through exercies for an exam in algorithm analysis and this is one of them: Present an algorithm that takes as input a list of n elements (that are comparable) and sorts them in O(n log m)

Possible Duplicate: Efficiently finding the ranks of elements in an array? If I have an array of elements A [0 to 15], and I know that the median is in the range A[8..11]. How can I find the median

Possible Duplicates: Exception during iteration on collection and remove items from that collection How to remove elements from a generic list while iterating around it? Better way to remove matched i

Possible Duplicate: Java: adding elements to a collection during iteration My problem is that I want to expand a list with new elements while iterating over it and I want the iterator to continue wi

How do I transform this list: List(a, b, c, d) into a list with the following elements List(List(a), List(a, b), List(a, b, c), List(a, b, c, d)) My requirement is to bu

I have a list of items; I want to sort them, but I want a small element of randomness so they are not strictly in order, only on average ordered. How can I do this most efficiently? I don't mind if th

This question already has an answer here: how to delete duplicate rows from a table in mysql 1 answer I have some rows in a MySQL database which are duplicates, except for their ID number. For

for eg I have the List: old = ['Savannah', '234Today', '4.5678', '23456','0.2342429'] How can I convert it to a list with elements with the default type to: new = ['Savannah', '234Today', 4.5678, 234

I need (for g++) a (computation time) optimized algorithm tree structure to duplicate/multiplicate a tree. My tree will be a k-ary tree, but not necessarily filled. The main operation is to multiplica

How do I use grep or map to delete elements from an array or reference to array? I’m having problems using splice to remove one or more elements from a reference to array and would like to see whether

I have a java application which uses JPA. Say I have an entity called Product with name and price attributes (all entities have an id attribute). Naturally I can get a List<Product> fairly easil

Possible Duplicate: How to calculate or approximate the median of a list without storing the list I want to apply using C# an algorithm to find the median value using selection/quick sort. But I do

how to efficiently find and filter list of elements. here is the HTML <span class=tab-strip-text unselectable=on>Admin</span> <span class=tab-strip-text unselectable=on>User&

How can I delete the hidden cropped elements from a pdf from command line? I've tried many solutions to crop an element on a pdf page based on the coordinates, but the resulting pdf is of the same siz

This question already has an answer here: How to get the type of T from a generic List<T> 11 answers If I do something like this: var a = new List<something>(); var b = new somethin

I have an array like [1,1,1,2,4,6,3,3] and I would like to get the list of repeated elements, in this case [1,3]. I wrote this: my_array.select{|obj|my_array.count(obj)>1}.uniq But it is tragicall

This question already has an answer here: How to remove duplicates from unsorted std::vector while keeping the original ordering using algorithms? 4 answers Is there a way to delete duplicate e

Possible Duplicate: SQL - How can I remove duplicate rows? SQL query to delete duplicate rows from same table? How to find duplicity for example in this table? Column A is unique ID and columns E

I need to draw a corporate structure tree (sort-of like a family tree) in C#. All the ancillary code is there. It is colored, interactive, and fancy. The only trouble is the algorithm that actually de

I want to add duplicate elements on hashmap so: put(name1, 1); put(name1, 3); put(name1, 3); put(name2, 1); put(name2, 3); how i can do that?

Possible Duplicate: How many data a list can hold at the maximum What is the maximum number of elements a list can hold?

What is the minimum number of comparisons required to find the largest and smallest elements of an unsorted list of n distinct elements? What could be the best time complexity for above algorithm? Fr

Using Vim, I want to efficiently delete text while I am in insert mode. I can use Backspace or I can use CTRL+h. What does is it mean? Should I stop using Backspace and only use CTRL+h to edit efficie

(This is NOT a coursework question. Just my own personal learning.) I'm trying to do an exercise in Prolog to delete elements from a list. Here's my code : deleteall([],X,[]). deleteall([H|T],X,Result

I am using TBXML to extract and navigate XML tree's, however I need something to build XML trees - fairly simply ones efficiently. Currently I am writing horrible code like this. Does anyone have any

I know how to delete all elements from a list. I don't know how to delete just one item. Let's say I want to delete the 3th list item of an unorder list . lets say I have the following: <ul id=

What program does is as follows: The list contains product information which includes product id, name, price etc. User enters product id Check the id if it already exist in a list So if the id match

>>> x=[(x1,x2,x3),(x1,x2),(x2,x3),(x3,x4)] >>> x [('x1', 'x2', 'x3'), ('x1', 'x2'), ('x2', 'x3'), ('x3', 'x4')] I want to delete the tuple in the list--x ,if len(x

anyone know how can i delete duplicate rows by writing new way from script below to improve performance. DELETE lt1 FROM #listingsTemp lt1, #listingsTemp lt2 WHERE lt1.code = lt2.code and lt1.classifi

Possible Duplicate: How to search for “R” materials? Dear all, I have recently picked up the task of programming in R. R is the statistical package / programming enviroment/language. But there is ho

This is the dropdown list box. I want to remove the duplicate Apple using javascript. <select id=Fruits> <option value=Apple >Apple</option> <option value=Orange>Orange

I have a filesystem that uses a hash algorithm to organize files. I have used xcopy in the past to copy files to a different location by passing in a file that has a list of all the files and having i

Possible Duplicate: how to safely delete multiple pointers As the code below: #include <iostream> using namespace std; int main(int argc, _TCHAR* argv[]) { int *p, *q; q = new int; p = q; dele

I would like to delete the last n elements of a list in Prolog and put it in another list say L2. If I knew the exact number of elements to delete say 3, here is the code. But I am stuck with the vari

I have a List of Cards, the Card is a class. And I have three hands to which each hand are dealt to four cards. in the game inside the onTouchEvent I have to delete the touched card when it is in the

I have a vector of vectors: vector< vector<int> > BigVec; It contains an arbitrary number of vectors, each of an arbitrary size. I want to delete not duplicate elements of each vector, bu

I have a linked list of objects each containing a 32-bit integer (and provably fewer than 232 such objects) and I want to efficiently choose an integer that's not present in the list, without using an

Possible Duplicate: How do you split a list into evenly sized chunks in Python? Merge two lists in python? Original data in array: a = ['1', '2', '3', '4', '5', '6', '7', '8', '9'] Desired output: [

I have a list containing some countries and relative capitals. I would like to delete only the capitals to create a new countryList[] and viceversa. Here's my list countryCapitalList= ['AFGHANISTAN=',

I'm implementing an algorithm that involves lots of adding and removing things from sets. In R, this is slow because as far as I know, adding or removing things from a vector is slow, since the entire

I'm looking for an algorithm to efficiently place all-day/multi-day event banners, much like the month view in Outlook or Google Calendar. I have a number of events with a begin and end date, ordered

I'm trying to eliminate all the elements in a list that obey a certain condition. Specifically, I have a list of Points and I need to filter this list such that duplicate points should be removed from