简体   繁体   English

忽略单元格的顺序,删除Google表格中的重复行

[英]Remove duplicated rows in Google Sheets ignoring the order of cells

I have a spreadsheet with many "duplicate" rows. 我有一个包含许多“重复”行的电子表格。 I don't want to remove them manually, since I have thousands of rows. 我不想手动删除它们,因为我有数千行。 In my particular case, the rows may not be exact duplicates , as I don't care about the order of cells in the rows. 在我的特定情况下,行可能不是完全重复的 ,因为我不在乎行中单元格的顺序。 Here is an example: 这是一个例子:

A   B
dog cat
apple orange
red blue
cat dog

dog cat and cat dog are duplicate in my case. 在我的情况下, cat dog dog catcat dog是重复的。 So, I want to keep one of them. 因此,我想保留其中之一。 Don't care which one, could be the first or last. 不在乎哪个可能是第一个或最后一个。

I know I need some kind of order-independent row comparison. 我知道我需要某种与顺序无关的行比较。 How can this be accomplished using spreadsheet formulas or Google Apps Script? 如何使用电子表格公式或Google Apps脚本完成此操作?

PS My actual data has 7 columns and not 2 as my example. PS我的实际数据有7列,而我的示例没有2列。 (A to G) (从A到G)

I don't have experience with Google Sheets, but heres what I would do in Excel and hopefully you can replicate this in some way. 我没有使用Google表格的经验,但是这里是我在Excel中所做的事情,希望您可以通过某种方式进行复制。

As a comment suggests, using Remove Duplicates will do as you require, but first you need to normalise the list in a way that will pick up these duplicates. 就像一条评论所暗示的那样,使用“ Remove Duplicates将按您的要求进行操作,但是首先您需要以一种能够拾取这些重复项的方式对列表进行规范化。

In Column C: =IF(A1<B1,A1,B1) 在C列中: =IF(A1<B1,A1,B1)

In Column D: =IF(A1<B1,B1,A1) 在D列中: =IF(A1<B1,B1,A1)

This will essentially put the values in Columns A & B in alphabetical order between columns C & D, you can then perform the Remove Duplicates on these two new columns. 这实际上将按字母顺序将值放在A和B列的C和D列之间,然后您可以对这两个新列执行Remove Duplicates

根据奥利弗·卡尔(Oliver Carr)提供的解决方案,这里是单配方解决方案:

=unique(arrayformula({IF(A:A<B:B,A:A,B:B),IF(A:A<B:B,B:B,A:A)}))

For a flexible solution for many rows and maybe more columns, try this: 对于许多行甚至更多列的灵活解决方案,请尝试以下操作:

Assuming your data is in columns A:G, starting in row 1. 假设您的数据位于第1行的A:G列中。

In H1: =2^COUNTIF($A:$G,"<"&A1) Copy this formula over from H to N, and down as many rows as needed to cover all cells of data. 在H1中: =2^COUNTIF($A:$G,"<"&A1)将此公式从H复制到N,并根据需要向下覆盖任意行以覆盖所有数据单元。

In O1: =SUM($H:$N) and copy this down to cover all rows. 在O1中: =SUM($H:$N)并将其复制下来以覆盖所有行。

The value in O will be the same for rows with the same words. 对于具有相同单词的行,O中的值将相同。 You can now do remove duplicates on column O. 现在,您可以删除列O上的重复项。

What you are doing is assigning every word in your data a unique power of 2, and the sum is guaranteed to be unique (think of a binary number with as many digits as you have unique words, with each digit set to 1 if that word appears in the row, and otherwise zero). 您正在执行的操作是为数据中的每个单词分配2的唯一幂,并且保证总和是唯一的(请考虑一个二进制数,该位数与唯一单词的位数一样多,如果该单词将每个数字设置为1,出现在行中,否则为零)。

If you want a more condensed version, this can also be entered as an array formula (again with data in A:G): 如果您想要一个更简洁的版本,也可以将其输入为数组公式(同样以A:G中的数据):

In H1: =SUM(2^COUNTIF(A:G,"<"&A1:G1)) enter by pressing Cntl+Shift+Enter Or in Google Sheets: =ArrayFormula(SUM(2^COUNTIF(A:C,"<"&A1:C1))) 在H1中: =SUM(2^COUNTIF(A:G,"<"&A1:G1))通过按Cntl + Shift + Enter或在Google表格中输入: =ArrayFormula(SUM(2^COUNTIF(A:C,"<"&A1:C1)))

If you are restricted to Google Sheets, you can then use the UNIQUE function on H to get a unique list of IDs, and perform lookups into the original table to get the actual rows. 如果您仅限使用Google表格,则可以在H上使用UNIQUE函数来获取ID的唯一列表,然后对原始表进行查询以获取实际的行。

The answers by Spencer , Max and Oliver all use spreadsheet formulas to return filtered arrays. SpencerMaxOliver的答案都使用电子表格公式来返回过滤后的数组。 They have the advantage that they can automatically recalculate when new rows are added to the source data. 它们的优点是,当将新行添加到源数据时,它们可以自动重新计算。

However, you asked specifically about deleting rows, which none of those answers do. 但是,您专门询问了有关删除行的问题,这些答案都没有。 To accomplish this, you must use a script, as formulas do not remove source data from spreadsheets. 为此,您必须使用脚本,因为公式不会从电子表格中删除源数据。

This snippet contains a complete script, including a menu-driven user interface that will invoke the delSimilarRows() function for the current sheet. 该片段包含一个完整的脚本,包括一个菜单驱动的用户界面,该界面将为当前工作表调用delSimilarRows()函数。 It is written to be a spreadsheet-contained script, but could be easily converted to be an add-on. 它被编写为包含电子表格的脚本,但可以很容易地转换为附件。

 /** * @OnlyCurrentDoc Limits the script to only accessing the current spreadsheet. */ /** * Adds a custom menu * * @param {Object} e The event parameter for a simple onOpen trigger. */ function onOpen(e) { SpreadsheetApp.getUi() .createMenu('Custom') .addItem('Delete similar rows', 'delSimRowsGUI') .addToUi(); } /** * Prompt user for confirmation before proceeding with deletion. * Provide results after operation. * */ function delSimRowsGUI() { var ui = SpreadsheetApp.getUi(); var choice = ui.alert("Confirm action", "This will delete rows in the current sheet that contain sets of cells that already appear together in other rows.", ui.ButtonSet.OK_CANCEL); if (choice === ui.Button.OK) { var numDeleted = delSimilarRows(); ui.alert("Deleted "+numDeleted+" row"+(numDeleted===1?'.':'s.')); } } /** * Delete rows in the current sheet that contain sets of cells that already * appear together in other rows. (Almost duplicates, but order-independent.) * From: https://stackoverflow.com/a/37304191/1677912 * * @returns {Number} The number of matching rows that were deleted. */ function delSimilarRows() { // Get all rows from sheet. var currentSheet = SpreadsheetApp.getActiveSheet(); var data = currentSheet.getDataRange().getValues(); var numDeleted = 0; // Sort cells within rows, and join into a string with (hopefully!) unique delimiter var sorted = data.map(function(row) { return row.sort().join(' |-| '); }); // Identify duplicate rows in the sorted data, and delete the corresponding // spreadsheet rows. (Note: looping backwards, so deletion is clean.) for (var row=sorted.length-1; row>=0; row--) { if (sorted.slice(0,row).indexOf(sorted[row]) !== -1) { currentSheet.deleteRow(row+1); numDeleted++; } } return numDeleted; } 

The function that does all the real work is delSimilarRows() . 做所有实际工作的函数是delSimilarRows() It uses some JavaScript magic to identify rows to be removed, and directly deletes them from the current sheet. 它使用一些JavaScript魔术来识别要删除的行,并将其直接从当前工作表中删除。

It handles all types of data, by temporarily converting rows to their string representations, with cell contents sorted alphabetically and a (hopefully) unique separator between them. 它通过将行临时转换为它们的字符串表示形式来处理所有类型的数据,并按字母顺序对单元格内容进行排序,并在它们之间使用(希望)唯一的分隔符。 Doing this, your example data will appear (to the computer only) like this: 这样做,您的示例数据将如下显示(仅在计算机上):

[ "cat |-| dog",
  "apple |-| orange",
  "blue |-| red",
  "cat |-| dog" ]

We can then loop through the rows checking for duplicates using the JavaScript Array.indexOf() method on slices of the row array excluding our current row. 然后,我们可以遍历行检查使用JavaScript重复Array.indexOf()的方法的排阵不包括我们当前行的。

Because we're dealing with 0-based JavaScript arrays as well as 1-based Spreadsheet rows, we need to be careful about adding or subtracting 1 while indexing one or the other. 由于我们要处理的是基于0的JavaScript数组以及基于1的电子表格行,因此在索引一个或另一个时,我们需要加或减1

/**
 * Delete rows in the current sheet that contain sets of cells that already 
 * appear together in other rows. (Almost duplicates, but order-independent.)
 * From: https://stackoverflow.com/a/37304191/1677912
 *
 * @returns {Number}       The number of matching rows that were deleted.
 */
function delSimilarRows() {
  // Get all rows from sheet.
  var currentSheet = SpreadsheetApp.getActiveSheet();
  var data = currentSheet.getDataRange().getValues();
  var numDeleted = 0;

  // Sort cells within rows, and join into a string with (hopefully!) unique delimiter
  var sorted = data.map(function(row) {
    return row.sort().join(' |-| ');
  });

  // Identify duplicate rows in the sorted data, and delete the corresponding
  // spreadsheet rows. (Note: looping backwards, so deletion is clean.)
  for (var row=sorted.length-1; row>=0; row--) {
    if (sorted.slice(0,row).indexOf(sorted[row]) !== -1) {
      currentSheet.deleteRow(row+1);
      numDeleted++;
    }
  }
  return numDeleted;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM