简体   繁体   English

在python问题中使用pandas解析csv文件

[英]Using pandas to parse csv files in python issues

I'm currently working on a script to help sort items in a csv file.我目前正在编写一个脚本来帮助对 csv 文件中的项目进行排序。 As it can be seen in the code below, I'm employing pandas to read in two specific columns of the csv file and then converting the read data into lists.从下面的代码中可以看出,我正在使用 pandas 读取 csv 文件的两个特定列,然后将读取的数据转换为列表。 What I'm trying to accomplish (in a simple, although insufficient, way) is to parse through these two lists which match up according to the following image:我想要完成的(以一种简单但不充分的方式)是解析根据下图匹配的这两个列表:

Result of using panda to read in two specific columns of csv file.使用熊猫读取 csv 文件的两个特定列的结果。 Note: that on the right column labeled Responsible are department names listed and on the right column labeled Status are status on order put into the given department.注意:右栏中标有“负责人”的是列出的部门名称,右栏中标有“状态”的是按订单放入给定部门的状态。

Here is the a copy of the code I'm using to parse through the two lists:这是我用来解析两个列表的代码的副本:

    import pandas as pd

    PHSc = 0
    PHSi = 0
    PHS = 0
    MOB = 0
    MOBc = 0
    MOBi = 0
    PHR = 0
    PHRc = 0
    PHRi = 0
    SYH = 0
    SYHc = 0
    SYHi = 0
    APS = 0
    APSc = 0
    APSi = 0
    ANA = 0
    ANAc = 0
    ANAi = 0
    CKC = 0
    CKCc = 0
    CKCi = 0
    RLG = 0
    RLGc = 0
    RLGi = 0
    DJ = 0
    DJc = 0
    DJi = 0
    EMS = 0
    EMSc = 0
    EMSi = 0
    CTS = 0
    CTSc = 0
    CTSi = 0
    ROT = 0
    ROTc = 0
    ROTi = 0
    SW = 0
    SWc = 0
    SWi = 0
    NU = 0
    NUc = 0
    NUi = 0
    RON = 0
    RONc = 0
    RONi = 0
    MON = 0
    MONc = 0
    MONi = 0
    TJS = 0
    TJSc = 0
    TJSi = 0
    VCH = 0
    VCHc = 0
    VCHi = 0
    NLN = 0
    NLNc = 0
    NLNi = 0
    KND = 0
    KNDc = 0
    KNDi = 0
    SHE = 0
    SHEc = 0
    SHEi = 0
    DOS = 0
    DOSc = 0
    DOSi = 0
    E1 = 0
    E1c = 0
    E1i = 0
    MOC = 0
    MOCc = 0
    MOCi = 0
    AC = 0
    ACc = 0
    ACi = 0
    SL = 0
    SLc = 0
    SLi = 0
    JER = 0
    JERc = 0
    JERi = 0
    BLP = 0
    BLPc = 0
    BLPi = 0

    file = input("Please enter the name of the file (include .csv): ")

    data = pd.read_csv(file, usecols = \[23, 26\], skiprows = 1, names=\['Responsible', 'Status'\])
    print(data)

    responsible = data\['Responsible'\].tolist()
    print(responsible)
    status = data\['Status'\].tolist()
    print(status)

    for i in responsible:
     for j in status:
      if i == 'PHS' and j == 'Comp':
       PHSc += 1
       PHS += 1
       break
      elif i == 'PHS' and j == 'Active':
       PHSi += 1
       PHS += 1
       break
      elif i == 'MOB' and j == 'Comp':
       MOBc = MOBc + 1
       MOB += 1
       break
      elif i == 'MOB' and j == 'Active':
       MOBi = MOBi + 1
       MOB += 1
       break
      elif i == 'PHR' and j == 'Comp':
       PHRc = PHRc + 1
       PHR += 1
       break
      elif i == 'PHR' and j == 'Active':
       PHRi = PHRi + 1
       PHR += 1
       break
      elif i == 'SYH' and j == 'Comp':
       SYHc = SYHc + 1
       SYH += 1
       break
      elif i == 'SYH' and j == 'Active':
       SYHi = SYHi + 1
       SYH += 1
       break
      elif i == 'APS' and j == 'Comp':
       APSc = APSc + 1
       APS += 1
       break
      elif i == 'APS' and j == 'Active':
       APSi = APSi + 1
       APS += 1
       break
      elif i == 'ANA' and j == 'Comp':
       ANAc = ANAc + 1
       ANA += 1
       break
      elif i == 'ANA' and j == 'Active':
       ANAi = ANAi + 1
       ANA += 1
       break
      elif i == 'CKC' and j == 'Comp':
       CKCc = CKCc + 1
       CKC += 1
       break
     elif i == 'CKC' and j == 'Active':
       CKCi = CKCi + 1
       CKC += 1
       break
     elif i == 'RLG' and j == 'Comp':
      RLGc = RLGc + 1
      RLG += 1
      break
     elif i == 'RLG' and j == 'Active':
      RLGi = RLGi + 1
      RLG += 1
      break
     elif i == 'DJ' and j == 'Comp':
      DJc = DJc + 1
      DJ += 1
      break
     elif i == 'DJ' and j == 'Active':
      DJi = DJi + 1
      DJ += 1
      break
     elif i == 'EMS' and j == 'Comp':
      EMSc = EMSc + 1
      EMS += 1
      break
    elif i == 'EMS' and j == 'Active':
      EMSi = EMSi + 1
      EMS += 1
      break
     elif i == 'CTS' and j == 'Comp':
      CTSc = CTSc + 1
      CTS += 1
      break
     elif i == 'CTS' and j == 'Active':
      CTSi = CTSi + 1
      CTS += 1
      break
     elif i == 'ROT' and j == 'Comp':
      ROTc = ROTc + 1
      ROT += 1
      break
     elif i == 'ROT' and j == 'Active':
      ROTi = ROTi + 1
      ROT += 1
      break
     elif i == 'SW' and j == 'Comp':
      SWc = SWc + 1
      SW += 1
      break
     elif i == 'SW' and j == 'Active':
      SWi = SWi + 1
      SW += 1
      break
     elif i == 'NU' and j == 'Comp':
      NUc = NUc + 1
      NU += 1
      break
     elif i == 'NU' and j == 'Active':
      NUi = NUi + 1
      NU += 1
      break
     elif i == 'RON' and j == 'Comp':
      RONc = RONc + 1
      RON += 1
      break
     elif i == 'RON' and j == 'Active':
      RONi = RONi + 1
      RON += 1
      break
     elif i == 'MON' and j == 'Comp':
      MONc = MONc + 1
      MON += 1
      break
     elif i == 'MON' and j == 'Active':
      MONi = MONi + 1
      MON += 1
      break
     elif i == 'TJS' and j == 'Comp':
      TJSc = TJSc + 1
      TJS += 1
      break
     elif i == 'TJS' and j == 'Active':
      TJSi = TJSi + 1
      TJS += 1
      break
     elif i == 'VCH' and j == 'Comp':
      VCHc = VCHc + 1
      VCH += 1
      break
     elif i == 'VCH' and j == 'Active':
      VCHi = VCHi + 1
      VCH += 1
      break
     elif i == 'NLN' and j == 'Comp':
      NLNc = NLNc + 1
      NLN += 1
      break
     elif i == 'NLN' and j == 'Active':
      NLNi = NLNi + 1
      NLN += 1
      break
     elif i == 'KND' and j == 'Comp':
      KNDc = KNDc + 1
      KND += 1
      break
     elif i == 'KND' and j == 'Active':
      KNDi = KNDi + 1
      KND += 1
      break
     elif i == 'SHE' and j == 'Comp':
      SHEc = SHEc + 1
      SHE += 1
      break
     elif i == 'SHE' and j == 'Active':
      SHEi = SHEi + 1
      SHE += 1
      break
     elif i == 'DOS' and j == 'Comp':
      DOSc = DOSc + 1
      DOS += 1
      break
     elif i == 'DOS' and j == 'Active':
      DOSi = DOSi + 1
      DOS += 1
      break
     elif i == 'E1' and j == 'Comp':
      E1c = E1c + 1
      E1 += 1
      break
     elif i == 'E1' and j == 'Active':
      E1i = E1i + 1
      E1 += 1
      break
     elif i == 'MOC' and j == 'Comp':
      MOCc = MOCc + 1
      MOC += 1
      break
     elif i == 'MOC' and j == 'Active':
      MOCi = MOCi + 1
      MOC += 1
      break
     elif i == 'AC' and j == 'Comp':
      ACc = ACc + 1
      AC += 1
      break
     elif i == 'AC' and j == 'Active':
      ACi = ACi + 1
      AC += 1
      break
     elif i == 'SL' and j == 'Comp':
      SLc = SLc + 1
      SL += 1
      break
     elif i == 'SL' and j == 'Active':
      SLi = SLi + 1
      SL += 1
      break
    elif i == 'JER' and j == 'Comp':
      JERc = JERc + 1
      JER += 1
      break
    elif i == 'JER' and j == 'Active':
      JERi = JERi + 1
      JER += 1
      break
    elif i == 'BLP' and j == 'Comp':
      BLPc = BLPc + 1
      BLP += 1
      break
    elif i == 'BLP' and j == 'Active':
      BLPi = BLPi + 1
      BLP += 1
      break

    print("\\nPHS Complete: ", PHSc)
    print("PHS Incomplete: ", PHSi)
    print("PHS Total: ", PHS)

    print("\\nMOB Complete: ", MOBc)
    print("MOB Incomplete: ", MOBi)
    print("MOB Total: ", MOB)

    print("\\nPHR Complete: ", PHRc)
    print("PHR Incomplete: ", PHRi)
    print("PHR Total: ", PHR)

    print("\\nSYH Complete: ", SYHc)
    print("SYH Incomplete: ", SYHi)
    print("SYH Total: ", SYH)

    print("\\nAPS Complete: ", APSc)
    print("APS Incomplete: ", APSi)
    print("APS Total: ", APS)

    print("\\nANA Complete: ", ANAc)
    print("ANA Incomplete: ", ANAi)
    print("ANA Total: ", ANA)

Code Output代码输出

As it can be seen, the value for PHS Complete should be 49 and the value of PHS Incomplete should be 0 as the PHS dept completed all their required work for this given daily report.可以看出,PHS Complete 的值应为 49,PHS Incomplete 的值应为 0,因为 PHS 部门已完成此给定日报的所有必需工作。 So I'm unsure what I'm doing wrong here.所以我不确定我在这里做错了什么。 Same thing is happening with the other print statements.其他打印语句也发生了同样的事情。 The idea is to parse through both lists and increment an index value every time a dept (for example PHS) has a status of comp.这个想法是解析两个列表并在每次部门(例如 PHS)具有 comp 状态时递增索引值。 Doing the same and incrementing a different index value every time a dept has a status of Active.每当部门处于活动状态时,执行相同的操作并增加不同的索引值。 A third value is incremented to find the total number of jobs performed in a given day.第三个值递增以查找给定日期执行的作业总数。 I'm sure there is a more efficient way to do this but I'm new to python and have a firm background in C++.我确信有一种更有效的方法可以做到这一点,但我是 python 的新手并且在 C++ 方面有扎实的背景。 Any and all suggestions will be appreciated.任何和所有建议将不胜感激。 Thanks in advance!提前致谢!

You can get the counts grouped by the Responsible and Status columns like so:您可以获得按 Responsible 和 Status 列分组的计数,如下所示:

data.groupby(["Responsible", "Status"]).value_counts().reset_index(name="count")

This will look like:这看起来像:

   Responsible  Status  count
0           AC  Active      6
1           AC    Comp      2
2          ANA  Active      4
3          ANA    Comp      4
4          APS  Active      5
5          APS    Comp      3
6          BLP  Active      3
7          BLP    Comp      5
8          CKC  Active      5
9          CKC    Comp      3
10         CTS  Active      3
11         CTS    Comp      5
12          DJ  Active      5

Since you did not provide sample data that could be copied and pasted, your counts will of course vary.由于您没有提供可以复制和粘贴的示例数据,因此您的计数当然会有所不同。

You can also just get the count for each entry in the Responsible column with:您还可以通过以下方式获取 Responsible 列中每个条目的计数:

data["Responsible"].value_counts()

To put it all together and have counts for each Status plus a Total of both, you could do this:要将它们放在一起并计算每个状态加上两者的总数,您可以这样做:

data.groupby(["Responsible", "Status"])\
    .value_counts()\
    .reset_index(name="Count")\
    .pivot_table(values="Count", index="Responsible", columns="Status", aggfunc="sum", margins=True, margins_name="Total")

This will look like:这看起来像:

Status       Active  Comp  Total
Responsible                     
AC                6     2      8
ANA               4     4      8
APS               5     3      8
BLP               3     5      8
CKC               5     3      8
CTS               3     5      8
DJ                5     3      8
DOS               3     5      8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM