[英]How to subtract two columns of lists from each other in pandas?
I have data in a tab-separated value text file that look like this:我有一个制表符分隔值文本文件中的数据,如下所示:
FileName Onsets Offsets
FileName1 [9, 270, 763] [188, 727, 1252]
FileName2 [52, 634, 1166, 1775, 2104] [472, 1034, 1575, 1970, 2457]
FileName3 [180, 560, 1332, 1532] [356, 1286, 1488, 2018]
These are data from audio files.这些是来自音频文件的数据。 Each row contains a series of onset and offset times for each of the sounds I'm researching.
每行包含我正在研究的每个声音的一系列开始和偏移时间。
In the first row of data, 9 is the onset time of the first sound, and 188 is the offset time of the first sound.第一行数据中,9是第一声的起始时间,188是第一声的偏移时间。 That means it lasted for 179 ms.
这意味着它持续了 179 毫秒。
I need the durations for each sound, and the gaps of silence between each sound.我需要每个声音的持续时间,以及每个声音之间的静音间隔。
Currently I read the data as follows:目前我读取的数据如下:
import pandas as pd
import numpy as np
data = pd.read_csv('/path/file.txt', delimiter='\t')
FileName = data[["FileName"]].to_numpy()
Onsets = data[["Onsets"]].to_numpy()
Offsets = data[["Offsets"]].to_numpy()
That gives me three numpy arrays.这给了我三个 numpy 数组。 For the onsets and offsets, each row is actually an array of the numbers in the original data file.
对于起始点和偏移量,每一行实际上是原始数据文件中的数字数组。
What code can I use to extract those numbers so that I can then subtract the onset times from the offset times to determine the durations?我可以使用什么代码来提取这些数字,以便我可以从偏移时间中减去开始时间以确定持续时间?
ast.literal_eval
ast.literal_eval
将字符串列转换为列表'Onsets'
and 'Offsets'
to numpy.arrays
'Onsets'
和'Offsets'
'Onsets'
的值转换为numpy.arrays
[9, 270, 763]
and [188, 727, 1252]
begins at 188
and end at 270
. [9, 270, 763]
和[188, 727, 1252]
之间的第一个静默间隙从188
开始,到270
结束。Offsets
from the last two elements of Onsets
Offsets
从最后两个元件Onsets
270 - 188
and 763 - 727
270 - 188
和763 - 727
x[0][1:]
is all but the first element of Onsets
x[0][1:]
只是Onsets
的第一个元素x[1][:-1]
is all but the last element of Offsets
x[1][:-1]
是Offsets
的最后一个元素import pandas as pd
import numpy as np
from ast import literal_eval
# load data and use literal_eval to converts strings to lists
data = pd.read_csv('/path/file.txt', delimiter='\t', converters={'Onsets': literal_eval, 'Offsets': literal_eval})
# convert rows of lists to numpy arrays
data[['Onsets', 'Offsets']] = data[['Onsets', 'Offsets']].applymap(np.array)
# subtract the values in the arrays
data['duration'] = data.Offsets.sub(data.Onsets) # data.Offsets - data.Onsets can also be used
# calculate the gaps of silence
data['gaps'] = data[['Onsets', 'Offsets']].apply(lambda x: x[0][1:] - x[1][:-1], axis=1)
# display(data)
FileName Onsets Offsets duration gaps
0 FileName1 [9, 270, 763] [188, 727, 1252] [179, 457, 489] [82, 36]
1 FileName2 [52, 634, 1166, 1775, 2104] [472, 1034, 1575, 1970, 2457] [420, 400, 409, 195, 353] [162, 132, 200, 134]
2 FileName3 [180, 560, 1332, 1532] [356, 1286, 1488, 2018] [176, 726, 156, 486] [204, 46, 44]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.