简体   繁体   English

如何对数据条目进行排序,以便在 Python 或 Google 表格中将一系列数据组合在一起?

[英]How can I sort data entries so that a range of data is grouped together in Python, or Google Sheets?

For instance, given a dataset of various weights and names, how can I group individuals with similar weights (+/- 5% the weight I want) together?例如,给定一个具有不同权重和名称的数据集,我如何将具有相似权重(我想要的 +/- 5%)的个人分组在一起?

Thank you!谢谢!

I think you can use package.我认为您可以使用 package。 For excel, openpyxl is such essential package in python.对于 excel,openpyxl 是 python 中必不可少的 package。 You can do sort, data entry, making chart on excel by using this package in python.您可以在 python 中使用此 package 在 excel 上进行排序、数据输入、制作图表。 First, go to this link.首先,go 到这个链接。

https://pypi.org/ search openpyxl and install this package. https://pypi.org/搜索 openpyxl 并安装此 package。

https://openpyxl.readthedocs.io/en/stable/tutorial.html#create-a-workbook this link is for guideline to use openpyxl. https://openpyxl.readthedocs.io/en/stable/tutorial.html#create-a-workbook此链接用于指导使用 openpyxl。

I understand that you want to divide your data in groups where the maximum difference from the biggest to the smaller is ±5%.我了解您希望将数据分组,从最大到较小的最大差异为 ±5%。 I wrote an Apps Script code that can do that in your Sheet.我编写了一个可以在您的工作表中执行此操作的Apps 脚本代码。 First, I wrote an example Sheet with student names and test scores (from 0 to 10) instead of names and heights;首先,我写了一个示例表,其中包含学生姓名和考试成绩(从 0 到 10),而不是姓名和身高; I'll explain why later.我稍后会解释为什么。 This is the initial state of the example sheet:这是示例表的初始 state:

初始示例

In the Group column the code will drop the group ID as an integer starting at 0. This is the code:Group列中,代码会将组 ID 删除为从 0 开始的 integer。这是代码:

function so62060595() {
  var dataColumn = 2; // Column B
  var groupColumn = 3; // Column C
  var dataSheet = SpreadsheetApp.getActive().getActiveSheet();
  var dataRange = dataSheet.getRange(2, 1, dataSheet.getLastRow() - 1, dataSheet
    .getLastColumn()).sort({
    column: dataColumn,
    ascending: false
  });
  var data = dataRange.getValues();
  var groupingPercentage = 5 / 100 // 5%
  var upperBound = data[0][dataColumn - 1];
  var groupID = 0;

  for (var r = 0; r < data.length; r++) {
    if (upperBound - upperBound * groupingPercentage <= data[r][dataColumn -
      1] + data[r][dataColumn - 1] * groupingPercentage) {
      // Include in the same group
      data[r][groupColumn - 1] = groupID;
    } else {
      // Create a new group
      var groupID = groupID + 1;
      var upperBound = data[r][dataColumn - 1];

      data[r][groupColumn - 1] = groupID;
    }
  }

  dataRange.setValues(data);
}

The first thing that the code does is to open the sheet with SpreadsheetApp.getActive() , Spreadsheet.getActiveSheet() .代码所做的第一件事是使用SpreadsheetApp.getActive()Spreadsheet.getActiveSheet() () 打开工作表。 After that, the code looks for the data range with Sheet.getRange() (notice how it uses Sheet.getLastRow() and Sheet.getLastColumn() to find the range size) and sorts it with Range.sort() using the score column as a reference.之后,代码使用Sheet.getRange()查找数据范围(注意它如何使用Sheet.getLastRow()Sheet.getLastColumn()来查找范围大小)并使用Range.sort()使用分数对其进行排序列作为参考。 Later, it reads the range with Range.getValues() .稍后,它使用Range.getValues()读取范围。 Also, I initialized some variables like the ID and group columns, the desired grouping percentage (5%) in this case and the initial group ID ( 0 ).此外,我初始化了一些变量,如 ID 和组列、在这种情况下所需的分组百分比 (5%) 和初始组 ID ( 0 )。

After all that initialization the code will iterate over every row and check if the data value ( Score in the example) is at a ±5% of distance to the upper bound of the group (the highest value in the group).在所有初始化之后,代码将遍历每一行并检查数据值(示例中的Score )是否与组的上限(组中的最大值)之间的距离为 ±5%。 If the value is in the ±5% range, the group ID will be dropped.如果该值在 ±5% 范围内,则将删除组 ID。 If it's not in the range, a new group ID will be generated and the upper bound taken from that entry.如果不在范围内,则将生成一个新的组 ID,并从该条目中获取上限。 The process will continue until all entries have a group ID, and after that the data will be entered in the table with Range.setValues() .该过程将继续进行,直到所有条目都具有组 ID,然后将使用Range.setValues()将数据输入到表中。 The final result looks like this:最终结果如下所示:

最后一个例子

And now, why did I use test scores instead of heights?现在,为什么我用考试成绩而不是身高? Well, look what happens with an example of heights using the previous code:好吧,看看使用前面代码的高度示例会发生什么:

高度示例

Only two groups are generated ( 0 and 1 ) because the distance between realistic heights is smaller than ±5%.因为真实高度之间的距离小于±5%,所以只生成了两组( 01 )。 I hope that my answer helps you, but don't hesitate to ask me any additional doubt.我希望我的回答对您有所帮助,但请不要犹豫,问我任何其他疑问。


UPDATE更新

Based on the question update in your comment I modified the script.根据您评论中的问题更新,我修改了脚本。 If I understand correctly, you want a midpoint in each group and calculate the group bounds based on that midpoint plus/minus 5%.如果我理解正确,您需要每个组中的中点并根据该中点加/减 5% 计算组边界。 If my assumption is correct, you can use the following code:如果我的假设是正确的,您可以使用以下代码:

function calculateGroupBounds(groupingPercentage, groupUpperBound) {
  var groupBounds = {};
  groupBounds['groupUpperBound'] = groupUpperBound;
  groupBounds['groupMidpoint'] = 100 * groupUpperBound / (100 +
    groupingPercentage);
  groupBounds['groupLowerBound'] = (100 - groupingPercentage) * groupBounds[
    'groupMidpoint'] / 100;

  return groupBounds;
}

function so62060595B() {
  // Sheet reading
  var dataColumn = 2; // Column B
  var groupColumn = 3; // Column C
  var dataSheet = SpreadsheetApp.getActive().getActiveSheet();
  var dataRange = dataSheet.getRange(2, 1, dataSheet.getLastRow() - 1, dataSheet
    .getLastColumn()).sort({
    column: dataColumn,
    ascending: false
  });
  var data = dataRange.getValues();

  // Group initialization
  var groupingPercentage = 5; // 5%
  var groupID = 0;
  var groupUpperBound = data[0][dataColumn - 1];
  var groupBounds = calculateGroupBounds(groupingPercentage, groupUpperBound)
  var groupMidpoint = groupBounds['groupMidpoint'];
  var groupLowerBound = groupBounds['groupLowerBound'];

  for (var r = 0; r < data.length; r++) {
    if (data[r][dataColumn - 1] <= groupUpperBound && data[r][dataColumn - 1] >=
      groupLowerBound) {
      // Include in the same group
      data[r][groupColumn - 1] = groupID;
    } else {
      // Create a new group
      var groupID = groupID + 1;
      var upperBound = data[r][dataColumn - 1];
      var groupBounds = calculateGroupBounds(groupingPercentage,
        upperBound)
      var groupMidpoint = groupBounds['groupMidpoint'];
      var groupLowerBound = groupBounds['groupLowerBound'];

      data[r][groupColumn - 1] = groupID;
    }
  }

  dataRange.setValues(data);
}

This new code uses the same Apps Script methods as the previous one and includes a new function ( calculateGroupBounds() ) to compute the upper and lower bounds and the midpoint.此新代码使用与前一个相同的 Apps 脚本方法,并包含一个新的 function ( calculateGroupBounds() ) 来计算上限和下限以及中点。 In the data iteration the code will check if the value falls between the upper and lower bound, and if it does the group iD will be dropped.在数据迭代中,代码将检查值是否在上限和下限之间,如果是,组 iD 将被丢弃。 If it doesn't a new group will be created.如果没有,将创建一个新组。 This is the result with the same example data as the previous code:这是与前面的代码具有相同示例数据的结果:

使用 Code II 评分示例

And this is the result with the heights table:这是高度表的结果:

使用 Code II 的高度示例

These results are the same as the previous code even though we use a different approach in this second code.尽管我们在第二个代码中使用了不同的方法,但这些结果与前面的代码相同。 This is because on the first code I used mathematical properties to divide the data in ±5% groups without calculating the midpoint.这是因为在第一个代码中,我使用数学属性将数据划分为 ±5% 的组,而不计算中点。 Please, ask me any doubts if you still need help.如果您仍然需要帮助,请问我任何疑问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM