I have a very long txt file with a conversation in the following format:
[19/12/17 16:30:36] A: Los mensajes en este grupo ahora están protegidos con cifrado de extremo a extremo.
[19/12/17 16:31:23] B: Buenas tardes, bienvenidos
[19/12/17 16:31:36] B: imagen omitida
[19/12/17 16:31:36] C: Hola!! ☺
[19/12/17 16:31:51] D: Hola!!!!
[19/12/17 16:32:10] B: Estamos aquí reunidos... bueno, todos sabéis ya para qué
[19/12/17 16:32:49] B: Formamos parte de un estudio de x relacionadas con la Lógica Convergente 😉
[19/12/17 16:34:32] E. Carvajal: Hola...!
[19/12/17 16:34:37] B: Antes de nada,(...) “lista de espera”
[19/12/17 16:37:23] C: Hola❗❗❗
[19/12/17 16:38:17] F: Por cierto como no os conozco a todos yo soy e 🙋♀
[19/12/17 16:39:19] G: Soy soy x🙋♂
[19/12/17 16:39:51] B: Yo x
I have already split the txt file into an array using split("[")
. I'm using [
as the messages are long and contain linebreaks. All [
signs that are not part of timestamp are escaped.
As of now, this gives me an array of messages like so:
1: "[timestamp] b: blabla"
2: "[timestamp] c: blabla"
3: "[timestamp] a: blabla"
(...)
3000: "[timestamp] b: blabla"
Now I would need to save the items in the array into different arrays based on author, probably looping through it and identifying unique authors, being unique author the text between ] and : in every item.
The end result should be a collection of arrays in which every one contains a collection of messages by author:
[timestamp] c: blabla
[timestamp] c: blabla
[timestamp] c: blabla
Then:
[timestamp] b: blabla
[timestamp] b: blabla
And so on.
I imagine I could iterate on the array and identify all the unique "] author:"
and then push them into their own array, but I'm a bit lost on how to do about doing just that.
I remember doing something similar with Lodash back in the day, but I can't remember the name of the function. How would you go about something like this in JS?
I would suggest that each line is turned into an object structure, so you get objects like:
{ timestamp: "2019-01-01T12:13:44", author: "Helen", msg: "blablabla" }
You could sift through the lines using a Map
to collect the records by their author.
// Sample input: const text = `[2019-01-02T12:03:08] john peterson: blabla [2019-01-02T16:33:15] helen bloom: blabla [2019-01-02T17:00:10] mark stanley: blabla [2019-01-02T17:14:44] helen bloom: blabla [2019-01-02T17:14:59] mark stanley: blabla [2019-01-02T17:22:21] jenifer mcenroe: blabla`; // Parse/structure the data: const data = text .match(/.+/g) // Split into lines .map(line => line.match(/\\[(.*?)\\]\\s*(.*?)\\:\\s*(.*)/)) // Pattern match .filter(Boolean) // Exclude non-matching lines // ... and structure into objects: .map(([, timestamp, author, msg]) => ({timestamp, author, msg})); // Create an array per author using a Map const map = new Map(data.map(({author}) => [author, []])); // Populate those arrays data.forEach(item => map.get(item.author).push(item)); // ...and extract them into the final result: const result = Array.from(map.values()); console.log(result);
this might help, Object.entries(result)
would give you a pair [author, messages]
while Object.values
would only give you messages
const groupByAuthor = src => src .split('\\n') .reduce((res, row) => { const [, author] = /\\] (\\w):/.exec(row) || []; if (author) { res[author] = (res[author] || []).concat(row); } return res; }, {}) ; const source = ` [timestamp] a: "blabla 1" [timestamp] b: "blabla 2" [timestamp] c: "blabla 3" [timestamp] b: "blabla 2" [timestamp] c: "blabla 3" [timestamp] d: "blabla 4" ` const result = groupByAuthor(source); console.log('result', result);
Or simply thus
const msgByAuthor = (messages, author) => {
const regex = new RegExp('(.+)\\s'+ author +':(.*)', "gm");
return messages.match(regex);
}
const messages = `Put messages here`;
console.log(msgByAuthor(messages, 'C'));
// Array ["[19/12/17 16:31:36] C: Hola!! ☺", "[19/12/17 16:37:23] C: Hola❗❗❗"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.