I need to remove all lines, which start with the letter "C" in the column"InvoiceNo".cI couldn't find an answer here that is why I would appreciate any help.
import numpy as np
import pandas as pd
import csv
from matplotlib import pyplot as plt
import xlsxwriter
import re
dataset = pd.read_excel('OnlineRetail2.xlsx')
dataset.head()
If you could provide some sample data in plain text, it would help me test this, but I believe this should do the trick.
dataset = dataset.loc[dataset.InvoiceNo.str[0] != 'C'].copy()
Basically select those rows where dataset.InvoiceNo
does not start with the letter C, and then reassign your dataset to a copy of just those rows (throw everything else out).
You can exclude any line starting with a comment when reading a file in pandas. For excel:
dataset = pd.read_excel('OnlineRetail2.xlsx', comment="C")
Search for 'comment' in the 'read_excel' documentation .
dataset[dataset["InvoiceNo"].str[0] != "C"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.