简体   繁体   English

如何在熊猫中将两列列表相减?

[英]How to subtract two columns of lists from each other in pandas?

I have data in a tab-separated value text file that look like this:我有一个制表符分隔值文本文件中的数据,如下所示:

FileName    Onsets          Offsets
FileName1   [9, 270, 763]   [188, 727, 1252]
FileName2   [52, 634, 1166, 1775, 2104] [472, 1034, 1575, 1970, 2457]
FileName3   [180, 560, 1332, 1532]  [356, 1286, 1488, 2018]

These are data from audio files.这些是来自音频文件的数据。 Each row contains a series of onset and offset times for each of the sounds I'm researching.每行包含我正在研究的每个声音的一系列开始和偏移时间。

In the first row of data, 9 is the onset time of the first sound, and 188 is the offset time of the first sound.第一行数据中,9是第一声的起始时间,188是第一声的偏移时间。 That means it lasted for 179 ms.这意味着它持续了 179 毫秒。

I need the durations for each sound, and the gaps of silence between each sound.我需要每个声音的持续时间,以及每个声音之间的静音间隔。

Currently I read the data as follows:目前我读取的数据如下:

import pandas as pd
import numpy as np

data = pd.read_csv('/path/file.txt', delimiter='\t')
    
FileName = data[["FileName"]].to_numpy()  
Onsets = data[["Onsets"]].to_numpy()  
Offsets = data[["Offsets"]].to_numpy() 

That gives me three numpy arrays.这给了我三个 numpy 数组。 For the onsets and offsets, each row is actually an array of the numbers in the original data file.对于起始点和偏移量,每一行实际上是原始数据文件中的数字数组。

What code can I use to extract those numbers so that I can then subtract the onset times from the offset times to determine the durations?我可以使用什么代码来提取这些数字,以便我可以从偏移时间中减去开始时间以确定持续时间?

  • The first issue is, you have columns of strings that must be converted to lists, using ast.literal_eval第一个问题是,您必须使用ast.literal_eval将字符串列转换为列表
  • In order to perform array subtracting, convert the values in 'Onsets' and 'Offsets' to numpy.arrays为了执行数组减法,将'Onsets''Offsets' 'Onsets'的值转换为numpy.arrays
  • To calculate the gaps of silence:计算静音间隔:
    • The first gap of silence between [9, 270, 763] and [188, 727, 1252] begins at 188 and end at 270 . [9, 270, 763][188, 727, 1252]之间的第一个静默间隙从188开始,到270结束。
    • To perform the array calculation, subtract the first two elements of Offsets from the last two elements of Onsets要执行的阵列计算,减去的前两个元素Offsets从最后两个元件Onsets
      • 270 - 188 and 763 - 727 270 - 188763 - 727
      • x[0][1:] is all but the first element of Onsets x[0][1:]只是Onsets的第一个元素
      • x[1][:-1] is all but the last element of Offsets x[1][:-1]Offsets的最后一个元素
import pandas as pd
import numpy as np
from ast import literal_eval

# load data and use literal_eval to converts strings to lists
data = pd.read_csv('/path/file.txt', delimiter='\t', converters={'Onsets': literal_eval, 'Offsets': literal_eval})

# convert rows of lists to numpy arrays
data[['Onsets', 'Offsets']] = data[['Onsets', 'Offsets']].applymap(np.array)

# subtract the values in the arrays
data['duration'] = data.Offsets.sub(data.Onsets)  # data.Offsets - data.Onsets can also be used

# calculate the gaps of silence
data['gaps'] = data[['Onsets', 'Offsets']].apply(lambda x: x[0][1:] - x[1][:-1], axis=1)

# display(data)
    FileName                       Onsets                        Offsets                   duration                  gaps
0  FileName1                [9, 270, 763]               [188, 727, 1252]            [179, 457, 489]              [82, 36]
1  FileName2  [52, 634, 1166, 1775, 2104]  [472, 1034, 1575, 1970, 2457]  [420, 400, 409, 195, 353]  [162, 132, 200, 134]
2  FileName3       [180, 560, 1332, 1532]        [356, 1286, 1488, 2018]       [176, 726, 156, 486]         [204, 46, 44]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM