简体   繁体   中英

How to remove lines which start with something in Python?

I need to remove all lines, which start with the letter "C" in the column"InvoiceNo".cI couldn't find an answer here that is why I would appreciate any help. 在此处输入图片说明

import numpy as np
import pandas as pd
import csv
from matplotlib import pyplot as plt
import xlsxwriter
import re


dataset = pd.read_excel('OnlineRetail2.xlsx')
dataset.head()

If you could provide some sample data in plain text, it would help me test this, but I believe this should do the trick.

dataset = dataset.loc[dataset.InvoiceNo.str[0] != 'C'].copy()

Basically select those rows where dataset.InvoiceNo does not start with the letter C, and then reassign your dataset to a copy of just those rows (throw everything else out).

You can exclude any line starting with a comment when reading a file in pandas. For excel:

dataset = pd.read_excel('OnlineRetail2.xlsx', comment="C")

Search for 'comment' in the 'read_excel' documentation .

dataset[dataset["InvoiceNo"].str[0] != "C"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM