簡體   English   中英

Python Pandas read_csv() 不必要地獲取與我的筆記本電腦相關的信息

[英]Python Pandas read_csv() taking in information related to my laptop unnecessarily

我正在為一些音頻數據使用一些標簽。 當我閱讀 csv 時,我更改了列的名稱。 出於某種原因,好像有兩個數據幀被讀入,一個包含我關心的來自 csv 的信息,另一個包含我的用戶名和我正在使用的筆記本電腦類型以及我計算機上的當前時間.

代碼:

# initializing the output dataframe that will contain all of the labels and the relevant metadata
# across each audio clip in a dataset. Should be in the format to work with it in the Python package
manual_df = pd.DataFrame()
# the ground truth labels lack column names, so I am filling them in closer to the end product
column_names = ["OFFSET","MANUAL ID"]
for clip_annotations in os.listdir(label_path):
    # isolating the name of the clip from the csv file
    # will be used to extract the metadata from the equivelant wav file
    x = clip_annotations.split('.')
    clip_name = x[0]
    # taking in the labels for the audio clip
    clip_df = pd.read_csv(label_path+clip_annotations,names=column_names)
    print(clip_df)
    # removing the annotations that occur over the same interval in the clip
    # first step in converting multi-class classifier into binary classifier.
    clip_df = clip_df.drop_duplicates(subset = ["OFFSET"])
    # second step to converting multi-class classifier to binary classifier
    # Isn't all that necessary since we don't use the MANUAL ID Column that much yet
    clip_df["MANUAL ID"] = "bird"
    # splitting the time into OFFSET and DURATION
    new = clip_df["OFFSET"].str.split("-", n = 1, expand = True)
    clip_df["OFFSET"] = new[0]
    clip_df["DURATION"] = 5
    #print(clip_df)
    # converting hours minutes seconds format into seconds
    new = clip_df["OFFSET"].str.split(":", n = 2, expand = True)
    #print(new)
    #new = new.rename(columns={"Hours","Minutes","Seconds"})
    #seconds_offset = new[0]*3600 + new[1]*60 + new[2]
    #print(seconds_offset)
new
output: 
                                         OFFSET  \
NaN jacob jacob-Aspire-E5-575  26.03.2021 13:49   

                                                               MANUAL ID  
NaN jacob jacob-Aspire-E5-575  file:///home/jacob/.config/libreoffice/4;  
                OFFSET MANUAL ID
0    00:00:00-00:00:05   cintin1
1    00:00:05-00:00:10   cintin1
2    00:00:05-00:00:10   citwoo1
3    00:00:10-00:00:15   butwoo1
4    00:00:10-00:00:15   cintin1
..                 ...       ...
319  00:09:50-00:09:55    meapar
320  00:09:50-00:09:55   strwoo2
321  00:09:55-00:10:00   butwoo1
322  00:09:55-00:10:00   hauthr1
323  00:09:55-00:10:00    meapar

[324 rows x 2 columns]

我的目標是讓我不再獲取與我的筆記本電腦相關的不必要信息

我回去打印了clip_annotations,結果發現我感興趣的文件有一些重復的“鎖定”文件,看起來像這樣:.~lock.PER49_20190131.csv#不確定為什么會發生這種情況,但是對於我在這里的情況,這個腳本不需要是通用的,所以我只是在循環開始時用這個條件圍繞它進行編碼:


if clip_annotations.startswith(".~lock."):
        continue

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM