使用 Pandas 跳过 CSV 文件中的第一列

Question

I have a csv file that is generated that has some information in the first line.我有一个生成的 csv 文件，第一行有一些信息。 I'm trying to skip it but it doesn't seem to work.我试图跳过它，但它似乎不起作用。 I tried looking at several suggestions and examples.我尝试查看一些建议和示例。

I tried using skiprows.我尝试使用 skiprows。

I also looked at several other examples.我还查看了其他几个示例。 Pandas drop first columns after csv read https://datascientyst.com/pandas-read-csv-file-read_csv-skiprows/在 csv 读取https://datascientyst.com/pandas-read-csv-file-read_csv-skiprows/ 后，Pandas 删除第一列

Nothing I tried worked the way I wanted it.我尝试过的任何东西都没有按照我想要的方式工作。

When I got it to work it deleted the entire row.当我让它工作时，它删除了整行。

Here is a sample of the code这是代码示例

# Imports the Pandas Module. It must be installed to run this script.
import pandas as pd

# Gets source file link
source_file = 'Csvfile.csv'

# Gets csv file and encodes it into a format that is compatible. 
dataframe = pd.read_csv(source_copy, encoding='latin1')

df = pd.DataFrame({'User': dataframe.User, 'Pages': dataframe.Pages,  'Copies': dataframe.Copies,
                   'Color': dataframe.Grayscale, 'Duplex': dataframe.Duplex, 'Printer': dataframe.Printer})

# Formats data so that it can be used to count Duplex and Color pages.
df.loc[df["Duplex"] == "DUPLEX", "Duplex"] = dataframe.Pages
df.loc[df["Duplex"] == "NOT DUPLEX", "Duplex"] = 0
df.loc[df["Color"] == "NOT GRAYSCALE", "Color"] = dataframe.Pages
df.loc[df["Color"] == "GRAYSCALE", "Color"] = 0
df.sort_values(by=['User', 'Pages'])

file = df.to_csv('PrinterLogData.csv', index=False)

# Opens parsed CSV file.
output_source = "PrinterLogData.csv"
dataframe = pd.read_csv(output_source, encoding='latin1')

# Creates new DataFrame.
df = pd.DataFrame({'User': dataframe.User, 'Pages': dataframe.Pages,  'Copies': dataframe.Copies,
                   'Color': dataframe.Color, 'Duplex': dataframe.Duplex, 'Printer':
dataframe.Printer})

# Groups data by Users and Printer Sums
Report1 = df.groupby(['User'], as_index=False).sum().sort_values('Pages', ascending=False)
Report2 = (df.groupby(['Printer'], as_index=False).sum()).sort_values('Pages', ascending=False)

Sample Data样本数据

Sample Output of what I'm looking for.我正在寻找的示例输出。

Answer 1

This is an early draft of what you appear to want for your program (based on the simulated print-log.csv ):这是您似乎想要的程序的早期草稿（基于模拟的print-log.csv ）：

import csv
import itertools
import operator
import pathlib

CSV_FILE = pathlib.Path('print-log.csv')
EXTRA_COLUMNS = ['Pages', 'Grayscale', 'Color', 'Not Duplex', 'Duplex']


def main():
    with CSV_FILE.open('rt', newline='') as file:
        iterator = iter(file)
        next(iterator)  # skip first line if needed
        reader = csv.DictReader(iterator)
        table = list(reader)
    create_report(table, 'Printer')
    create_report(table, 'User')


def create_report(table, column_name):
    key = operator.itemgetter(column_name)
    table.sort(key=key)
    field_names = [column_name] + EXTRA_COLUMNS
    with pathlib.Path(f'{column_name} Report').with_suffix('.csv').open(
        'wt', newline=''
    ) as file:
        writer = csv.DictWriter(file, field_names)
        writer.writeheader()
        report = []
        for key, group in itertools.groupby(table, key):
            report.append({column_name: key} | analyze_group(group))
        report.sort(key=operator.itemgetter('Pages'), reverse=True)
        writer.writerows(report)


def analyze_group(group):
    summary = dict.fromkeys(EXTRA_COLUMNS, 0)
    for row in group:
        pages = int(row['Pages']) * int(row['Copies'])
        summary['Pages'] += pages
        summary['Grayscale'] += pages if row['Grayscale'] == 'GRAYSCALE' else 0
        summary['Color'] += pages if row['Grayscale'] == 'NOT GRAYSCALE' else 0
        summary['Not Duplex'] += pages if row['Duplex'] == 'NOT DUPLEX' else 0
        summary['Duplex'] += pages if row['Duplex'] == 'DUPLEX' else 0
    return summary


if __name__ == '__main__':
    main()

使用 Pandas 跳过 CSV 文件中的第一列

问题描述

1 个解决方案

解决方案1
0 2022-12-21 00:31:59

使用 Pandas 跳过 CSV 文件中的第一列

问题描述

1 个解决方案

解决方案1 0 2022-12-21 00:31:59

解决方案1
0 2022-12-21 00:31:59