简体   繁体   English

编写一个 function 获取一行并返回一个二维元组列表:歌曲名称和分数数据库

[英]Write a function that takes one row and returns a list of 2-dimension tuples: song title and points database

I need to preprocess some data so that I can start analyzing it.我需要预处理一些数据,以便我可以开始分析它。 I currently have a data frame which contains data of Eurovision winners.我目前有一个数据框,其中包含欧洲电视网冠军的数据。 I need to create a new data frame which contains the words from each of the songs, with the points of each song assigned to each word in a tuple.我需要创建一个新的数据框,其中包含每首歌曲中的单词,并将每首歌曲的点分配给元组中的每个单词。 For example, if the song name is 'Hello World' and the score is 31, I need to create two tuples (Hello, 31) and (World, 31) and add them to a list from which I can create a new data frame.例如,如果歌曲名称是'Hello World'并且分数是 31,我需要创建两个元组(Hello, 31)(World, 31)并将它们添加到一个列表中,我可以从中创建一个新的数据框.

Sample input样本输入

Here is the first row of my dataframe .这是我的 dataframe 的第一行

Sample Output样品 Output

The output I want from the first row is我想要的第一行的 output 是

[('Net', 31),('als', 31),('toen', 31)]

Attempt试图

def TupleGenerator(row):
    list =[]
    for item in ev['Song']: 
        tuple = (item, ev["Points"])
        list.append(tuple)
    return list
 

TupleGenerator(ev.iloc[0])

This is what I have tried so far, but I am not sure how to get the score from the same row to be assigned to the word in the tuple.到目前为止,这是我尝试过的方法,但我不确定如何从同一行中获取分数以分配给元组中的单词。

Any advice is appreciated, thank you.任何建议表示赞赏,谢谢。

You have the right idea, only right now you are iterating over every character in the string row["Song"] .你有正确的想法,只是现在你正在迭代字符串row["Song"]中的每个字符。 You need to split this string up into a sequence of substrings where each substring represents a word from the song.您需要将此字符串拆分为一系列子字符串,其中每个 substring 代表歌曲中的一个词。 Then iterate over this sequence.然后迭代这个序列。 This code shows how one might do that此代码显示了如何做到这一点

def TupleGenerator(row):
    result = []
    for word in row["Song"].strip('"').split():
        result.append((word, row["Points"]))
    return result 

The strip method of strings accepts one optional argument that is a string that specifies the set of characters to be removed. strings 的strip方法接受一个可选参数,该参数是一个字符串,指定要删除的字符集。 In our case, we need to remove " . The split method without any arguments returns a list of the words in the string, using consecutive whitespace string subsequences as the delimiter.在我们的例子中,我们需要删除" 。没有任何 arguments 的split方法返回字符串中的单词列表,使用连续的空白字符串子序列作为分隔符。

For example, if your df is例如,如果您的df

df = pd.DataFrame(
    {"Year": 1957,
     "Date": "3-Mar",
     "Host City": ["Frankfurt", "Linux"],
     "Winner": ["Netherlands", "Unix"],
     "Song": ['"Net als toen"', '"git hub"'],
     "Performer": ["Corry Brokken", "Stack Overflow"],
     "Points": [31, 32],
     "Margin": [14, 15],
     "Runner-up": ["France", "cyberspace"]
    }
)

Running跑步

for index, row in df.iterrows():
    print(TupleGenerator(row))

gives output给出 output

[('Net', 31), ('als', 31), ('toen', 31)]
[('git', 32), ('hub', 32)]

I hope this helps.我希望这有帮助。 Let me know if there are any questions!如果有任何问题,请告诉我!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM