簡體   English   中英

檢查pandas中是否存在行

[英]Check if a row exists in pandas

我想檢查數據幀中是否存在行,以下是我的代碼:

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Name','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 30957]
df1 = df1[['Format','Name','Number']]
df1 = df1.groupby(['Format','Name'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if 'entry' in df1:
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)

這是輸出:

Name    Apr 2013  Apr 2014  Apr 2015  Apr 2016  Apr 2017  Aug 2010  Aug 2013  
Format                                                                         

entry          0         0         0         1         4         1         0   
pdf           13        12         4        23         7         1         9   
sum           13        12         4        24        11         2         9 

如果df2中的'entry':僅檢查'entry'是否作為列存在? 我猜,情況一定是這樣的。 我們可以看到行'條目'存在,但我們仍然處於其他條件(如果它已經登陸,如果2016年4月的聲明總和將是23)。

如果我檢查它沒有行'entry'的文件,它再次登陸else語句(如我所料),所以我認為它總是進入else條件。

如何檢查pandas中是否存在行?

我認為你需要比較索引值 - 輸出是TrueFalse numpy數組。 並且對於標量需要any - 檢查至少一個Trueall以檢查所有值是否為True

(df.index == 'entry').any()

(df.index == 'entry').all()

約翰高爾特評論的另一個解決方案:

'entry' in df.index

如果需要檢查子字符串:

df.index.str.contains('en').any()

樣品

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
       Apr 2013
entry         1
pdf           2
sum           3

print (df.index == 'entry')
[ True False False]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False

#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
       Apr 2013
entry         1
entry         2
entry         3

print (df.index == 'entry')
[ True  True  True]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True

另一種檢查數據幀中是否存在行/行的方法是使用df.loc:

subDataFrame = dataFrame.loc [dataFrame [columnName] == value]

此代碼檢查給定行中的每個“值”(以逗號分隔),如果數據框中存在一行,則返回True / False

有一個使用Stocks作為數據幀的簡短示例

# *****     Code for 'Check if a line exists in dataframe' using Pandas     *****

# Checks if value can be converted to a number
# Return: True/False
def isfloat(value):
  try:
    float(value)
    return True
  except:
    return False


# Example:
# list1 = ['D','C','B','A']
# list2 = ['OK','Good','82','Great']
# mergedList = [['D','OK'],['C','Good'],['B',82],['A','Great']
def getMergedListFromTwoLists(list1, list2):
    mergedList = []
    numOfColumns = min(len(list1), len(list2))
    for col in range(0, numOfColumns):
        val1 = list1[col]
        val2 = list2[col]

        # In the dataframe value stored as a number
        if isfloat(val2):
            val2 = float(val2)
        mergedList.append([val1, val2])

    return mergedList


# Returns only rows that have valuesAsArray[1] in the valuesAsArray[0]
# Example: valuesAsArray = ['Symbol','AAPL'], returns rows with 'AAPL'
def getSubDataFrame(dataFrame, valuesAsArray):
    subDataFrame = dataFrame.loc[dataFrame[valuesAsArray[0]] == valuesAsArray[1]]
    return subDataFrame




def createDataFrameAsExample():
    import pandas as pd
    data = {
        'MarketCenter': ['T', 'T', 'T', 'T'],
        'Symbol': ['AAPL', 'FB', 'AAPL', 'FB'],
        'Date': [20190101, 20190102, 20190201, 20190301],
        'Time': ['08:00:00', '08:00:00', '09:00:00', '09:00:00'],
        'ShortType': ['S', 'S', 'S', 'S'],
        'Size': [10, 10, 20, 30],
        'Price': [100, 100, 300, 200]
    }
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']
    df = pd.DataFrame(data, columns=dfHeadLineAsArray)
    return df



def adapterCheckIfLineExistsInDataFrame(originalDataFrame, headlineAsArray, line):
    dfHeadLineAsArray = headlineAsArray
    # Line example: 'T,AAPL,20190101,08:00:00,S,10,100'
    lineAsArray = line.split(',')

    valuesAsArray = getMergedListFromTwoLists(dfHeadLineAsArray, lineAsArray)
    return checkIfLineExistsInDataFrame(originalDataFrame, valuesAsArray)



def checkIfLineExistsInDataFrame(originalDataFrame,  valuesAsArray):

    if not originalDataFrame.empty:


        subDateFrame = originalDataFrame
        for value in valuesAsArray:
            if subDateFrame.empty:
                return False
            subDateFrame = getSubDataFrame(subDateFrame, value)

        if subDateFrame.empty:
            False
        else:
            return True
    return False


def testExample():
    dataFrame = createDataFrameAsExample()
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']

    # Three made up lines (not in df)
    lineToCheck1 = 'T,FB,20190102,13:00:00,S,10,100'
    lineToCheck2 = 'T,FB,20190102,08:00:00,S,60,100'
    lineToCheck3 = 'T,FB,20190102,08:00:00,S,10,150'

    # This line exists in the dataframe
    lineToCheck4 = 'T,FB,20190102,08:00:00,S,10,100'

    lineExists1 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck1)
    lineExists2 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck2)
    lineExists3 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck3)
    lineExists4 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck4)

    expected = 'False False False True'
    print('Expected:',expected)
    print('Method:',lineExists1,lineExists2,lineExists3,lineExists4)



testExample()

單擊以查看示例中的數據框Dataframe

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM